This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.

**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**

This notebook was generated for TensorFlow 2.6.

# Introduction to deep learning for computer vision

## Introduction to convnets

Stack of Conv2D and MaxPooling2D layers

**Instantiating a small convnet**

In [1]:
from tensorflow import keras
from tensorflow.keras import layers
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

**Displaying the model's summary**

In [2]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 128)         73856 

**Training the convnet on MNIST images**

In [2]:
from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255
model.compile(optimizer="rmsprop",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fba601b0550>

**Evaluating the convnet**

In [3]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc:.3f}")

Test accuracy: 0.993


### The convolution operation

Dense layers learn global patterns in their input feature space whereas
convolution layers learn local patterns

The patterns they learn are translation-invariant

They can learn spatial hierarchies of patterns

Convolution preserves the spatial relationship between pixels by learning image features using small squares (depending on the filter size) of input data

Convolution: multiplying elementwise by filter and summing the multiplication
outputs

Ex) a 3x3 kernel or 3x3x1 filter acts on a 5x6 input image with stride 1 and outputs a 3x4 feature map.
In fully connected sense, we need unshared 30x12 weights (input size x output size)

#### How convolution filter works?

Different value of the filter matrix produce different feature maps for the same input image.

CNN learns the values of filters during.

The more filters the more features are extracted

Feature map

Size of a feature map is controlled by four parameters that we specify before training.


*   filter size: size of each filter (height x width x |channel|)
*   depth: number of filters (feature maps)
*  stride: number of pixels by which we slide our filter matrix
* zero-padding: padding the input matrix with zero around the border to make the output with the same size as the input. 


ReLU (nonlinearity)

ReLU for nonlinearity has been used after every convolution operation

It is an elementwise operation (applied per pixel) and replaces all negative pixel values in the feature map by zero

#### Understanding border effects and padding

#### Understanding convolution strides

### The max-pooling operation

Role of max pooling: to aggressively downsample feature maps

Transformed via a hardcoded max tensoroperation

We need the features from the last
convolution layer to contain
information about the totality of the
input

The final feature map has 22 × 22 ×
128 = 61,952 total coefficients per
sample

This is far too large for such a
small model and would result in
intense overfitting


**An incorrectly structured convnet missing its max-pooling layers**

In [4]:
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model_no_max_pool = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
model_no_max_pool.summary()

## Training a convnet from scratch on a small dataset

### The relevance of deep learning for small-data problems

Downloading a Kaggle dataset in Google Colaboratory

Access to the API is restricted to Kaggle users, you need to authenticate yourself.
The kaggle package will look for your login credentials in a JSON file located at

~/.kaggle/kaggle.json.

First, you need to create a Kaggle API key and download it to your local machine

Login -> My Account -> Account settings -> API section.

Click the Create New API Token butten

Second, go to your Colab notebook, and upload the API’s key JSON file to your
Colab session by running the following code in a notebook cell:

### Downloading the data

In [None]:
from google.colab import files
files.upload()

In [None]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle competitions download -c dogs-vs-cats

In [None]:
!unzip -qq train.zip

**Copying images to training, validation, and test directories**

In [None]:
import os, shutil, pathlib

original_dir = pathlib.Path("train") #path to the directory where the original dataset was uncompressed
new_base_dir = pathlib.Path("cats_vs_dogs_small")  #directory where we will store our smaller dataset

#utility f° to copy cat and dog images from index start_index to index end_index to the subdirectory new_base_dir/{subset_name}/cat{and/dog}
#the subset_name will either train validation or test
def make_subset(subset_name, start_index, end_index):
    for category in ("cat", "dog"):
        dir = new_base_dir / subset_name / category
        os.makedirs(dir)
        fnames = [f"{category}.{i}.jpg" for i in range(start_index, end_index)]
        for fname in fnames:
            shutil.copyfile(src=original_dir / fname,
                            dst=dir / fname)

make_subset("train", start_index=0, end_index=1000)#create the training subset with the first 1000 images of each category
make_subset("validation", start_index=1000, end_index=1500) #create the validation subset with the next 500 images of each category
make_subset("test", start_index=1500, end_index=2500) #create the test subset with the next 1000 images of each category

### Building the model

**Instantiating a small convnet for dogs vs. cats classification**

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(180, 180, 3)) #the model expects RGB images of size 180 x 180
x = layers.Rescaling(1./255)(inputs) #rescale inputs to the [0,1] range by dividing them by 255
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
model.summary()

**Configuring the model for training**

In [None]:
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

### Data preprocessing

1. Read the picture files.
2. Decode the JPEG content to RGB grids of pixels
3. Convert these into floating-point tensors
4. Resize them to a shared size (we’ll use 180 × 180)
5. Pack them into batches (we’ll use batches of 32 images)

**Using `image_dataset_from_directory` to read images**

In [None]:
from tensorflow.keras.utils import image_dataset_from_directory

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

#### Understanding TF Dataset objects
TensorFlow makes available the tf.data API to create efficient input pipelines

The Dataset class handles many key features that would otherwise be
cumbersome to implement yourself—in particular, asynchronous data prefetching

The Dataset class also exposes a functional-style API for modifying datasets

In [None]:
import numpy as np
import tensorflow as tf
random_numbers = np.random.normal(size=(1000, 16))
dataset = tf.data.Dataset.from_tensor_slices(random_numbers) #the from_tensor_slices() class method can be used to create a Dataset from numpy array, or a tuple or dict of Numpy arrays

In [None]:
#Yielding single samples
for i, element in enumerate(dataset):
    print(element.shape)
    if i >= 2:
        break

In [None]:
#we can use the .batch() method to batch the data
batched_dataset = dataset.batch(32)
for i, element in enumerate(batched_dataset):
    print(element.shape)
    if i >= 2:
        break

#### Range of useful dataset methods
.shuffle(buffer_size): Shuffles elements within a buffer

.prefetch(buffer_size): Prefetches a buffer of elements in GPU memory to achieve better device utilization.

.map(callable): Applies an arbitrary transformation to each element of the dataset

In [None]:
#mapping
reshaped_dataset = dataset.map(lambda x: tf.reshape(x, (4, 4)))
for i, element in enumerate(reshaped_dataset):
    print(element.shape)
    if i >= 2:
        break

**Displaying the shapes of the data and labels yielded by the `Dataset`**

In [None]:
for data_batch, labels_batch in train_dataset:
    print("data batch shape:", data_batch.shape)
    print("labels batch shape:", labels_batch.shape)
    break

**Fitting the model using a `Dataset`**

ModelCheckpoint callback to save the model after each epoch

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)

**Displaying curves of loss and accuracy during training**

Plot the loss and accuracy of the model over the training and validation data

In [None]:
import matplotlib.pyplot as plt
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
plt.plot(epochs, accuracy, "bo", label="Training accuracy")
plt.plot(epochs, val_accuracy, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

We’ll reload the model from its saved file to evaluate it as it was before it started overfitting.

**Evaluating the model on the test set**

In [None]:
test_model = keras.models.load_model("convnet_from_scratch.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

Because we have relatively few training samples (2,000), overfitting will be our
number one concern.

We learned dropout and weight decay (L2 regularization). We’re now going to
learn data augmentation

### Using data augmentation

Data augmentation takes the approach of generating more training data
from existing training samples by augmenting the samples via a number of
random transformations that yield believable-looking images

In Keras, this can be done by adding a number of data augmentation layers at
the start of your model.


**Define a data augmentation stage to add to an image model**

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

RandomFlip(“horizontal”) is for randomly flipping half the images horizontally

RandomRotation(0.1) Rotates the input images by a random value in the range [–10%, +10%]

RandomZoom(0.2) Zooms in or out of the image by a random factor in the range [-20%, +20%]
R

####Using data augmentation

**Displaying some randomly augmented training images**

In [None]:
plt.figure(figsize=(10, 10))
for images, _ in train_dataset.take(1): #we can use take(N) to only sample N batches from the dataset. This is equivalent to inserting a break in the loop after the Nth batch
    for i in range(9):
        augmented_images = data_augmentation(images) #apply the augmentation stage to the batch of images
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8")) #display the first image in the output batch. For each of the 9 iterations this is a different augmentation of the same image
        plt.axis("off")

**Defining a new convnet that includes image augmentation and dropout**

In [None]:
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs) ##!!!!!##
x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x) #!!!!#
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

**Training the regularized convnet**

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch_with_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=100,
    validation_data=validation_dataset,
    callbacks=callbacks)

**Evaluating the model on the test set**

In [None]:
test_model = keras.models.load_model(
    "convnet_from_scratch_with_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

We get a test accuracy of 83.5%.
It’s starting to look good!

## Leveraging a pretrained model

A common and highly effective approach to deep learning on small image datasets
is to use a pretrained model

Pretrained network is a saved network that was previously trained on a large
dataset

Motivations:
Lots of data, time, resources needed to train and tune a neural network from
scratch

Cheaper, faster way of adapting a neural network by exploiting their
generalization properties

### Feature extraction with a pretrained model

First part is called the
convolutional base of the model.

Convolutional base are likely to
be more generic and more reusable

Representations learned by the
classifier will be specific to data
on which the model was trained

List of image-classification models (all pretrained on the
ImageNet dataset) that are available as part of keras:
Xception, Inception V3, ResNet50, VGG16, VGG19, MobileNet
More available from tensorflow hub:

**Instantiating the VGG16 convolutional base**

In [None]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(180, 180, 3))

In [None]:
conv_base.summary()

#### Fast feature extraction without data augmentation

We’ll start by extracting features as NumPy arrays by calling the predict()
method of the conv_base model on our training

**Extracting the VGG16 features and corresponding labels**

In [None]:
import numpy as np

def get_features_and_labels(dataset):
    all_features = []
    all_labels = []
    for images, labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(images)
        features = conv_base.predict(preprocessed_images)
        all_features.append(features)
        all_labels.append(labels)
    return np.concatenate(all_features), np.concatenate(all_labels)

train_features, train_labels =  get_features_and_labels(train_dataset)
val_features, val_labels =  get_features_and_labels(validation_dataset)
test_features, test_labels =  get_features_and_labels(test_dataset)

In [None]:
train_features.shape

Training is very fast because we only have to deal with two Dense layers

**Defining and training the densely connected classifier**

In [None]:
inputs = keras.Input(shape=(5, 5, 512))
x = layers.Flatten()(inputs) #note the use of the flatten layer before passing the features to a dense layer
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
      filepath="feature_extraction.keras",
      save_best_only=True,
      monitor="val_loss")
]
history = model.fit(
    train_features, train_labels,
    epochs=20,
    validation_data=(val_features, val_labels),
    callbacks=callbacks)

**Plotting the results**

In [None]:
import matplotlib.pyplot as plt
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

#### Feature extraction together with data augmentation

Create a new model that chains together: 1) data augmentation, 2)
freezing convolutional base, 3) a dense classifier

**Instantiating and freezing the VGG16 convolutional base**

In [None]:
conv_base  = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False)
conv_base.trainable = False

**Printing the list of trainable weights before and after freezing**

In [None]:
conv_base.trainable = True
print("This is the number of trainable weights "
      "before freezing the conv base:", len(conv_base.trainable_weights))

In [None]:
conv_base.trainable = False
print("This is the number of trainable weights "
      "after freezing the conv base:", len(conv_base.trainable_weights))

**Adding a data augmentation stage and a classifier to the convolutional base**

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs) ##!!! apply data augmentation!!!
x = keras.applications.vgg16.preprocess_input(x) #apply input value scaling
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="feature_extraction_with_data_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=50,
    validation_data=validation_dataset,
    callbacks=callbacks)

**Evaluating the model on the test set**

In [None]:
test_model = keras.models.load_model(
    "feature_extraction_with_data_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

We get a test accuracy of 97.5%. This is only a modest improvement compared
to the previous test accuracy

### Fine-tuning a pretrained model

Fine-tuning consists of unfreezing a few of the top layers of a frozen model base used for feature extraction, and jointly training both the newly added part of the model

Steps:
> Add your custom network on top of an already-trained base network

>Freeze the base network

>Train the part you added

>Unfreeze some layers in the base network
Jointly train both these layers and the part you added

In [None]:
conv_base.summary()

**Freezing all layers until the fourth from the last**

In [None]:
conv_base.trainable = True
for layer in conv_base.layers[:-4]:
    layer.trainable = False

**Fine-tuning the model**

In [None]:
model.compile(loss="binary_crossentropy",
              optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="fine_tuning.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
model = keras.models.load_model("fine_tuning.keras")
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

## Summary

Convnets are the best type of machine-learning models for
computer-vision

On a small dataset, overfitting will be the main issue. Data augmentation is a powerful way

It’s easy to reuse an existing convnet on a new dataset via transfer learning

As a complement to feature extraction, you can use finetuning