<table class="tfo-notebook-buttons" align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/practicaldl/Practical-Deep-Learning-Book/blob/master/code/chapter-3/1-keras-custom-classifier-with-transfer-learning.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/practicaldl/Practical-Deep-Learning-Book/blob/master/code/chapter-3/1-keras-custom-classifier-with-transfer-learning.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

This code is part of [Chapter 3: Cats versus Dogs: Transfer Learning in 30 Lines with Keras](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch03.html).

Note: In order to run this notebook on Google Colab you need to [follow these instructions](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb#scrollTo=WzIRIt9d2huC) so that the local data such as the images are available in your Google Drive.

In [1]:
try:
    import google.colab

    IS_COLAB_ENV = True
except:
    IS_COLAB_ENV = False
IS_COLAB_ENV

True

# Building a Custom Classifier in Keras with Transfer Learning

As promised, it’s time to build our state of the art classifier in 30 lines or fewer! At a high level, we will follow the steps shown below:

- **Organize the data**: Download labeled images of cats and dogs from Kaggle. Then divide the images into training and validation folders.
- **Set up the configuration**: Define a pipeline for reading data, including preprocessing the images (e.g. resizing) and batching multiple images together.
- **Load and augment the data**: In the absence of a ton of training images, make small changes (augmentation) like rotation, zooming, etc to increase variation in training data.
- **Define the model**: Take a pre-trained model, remove the last few layers, and append a new classifier layer. Freeze the weights of original layers (i.e. make them unmodifiable). Select an optimizer algorithm and a metric to track (like accuracy).
- **Train and test**: Start training for a few iterations. Save the model to eventually load inside any application for predictions.

## Downloading the dataset from Kaggle

We need to download our [dataset](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/download/train.zip) - the famous Cats vs Dogs dataset - from Kaggle.

Kaggle allows an interface to download any dataset using the command line. These are the following steps that need to be done:

1. Go to your Kaggle account (or create one if it does not exist).
2. Open up `Settings` and scroll to the `API` section. Click `Expire Token` if already created, and then click on `Create New Token`. This will download a JSON file which contains the required Kaggle configurations.
3. Run the following commands to download the data.



In [2]:
# Install Kaggle CLI
!pip install -q kaggle

In [3]:
# Mount Google Drive
if IS_COLAB_ENV:
    from google.colab import drive

    drive.mount("/content/gdrive")

    # File upload prompt - upload your kaggle.json file here
    from google.colab import files

    files.upload()

    # Move the kaggle.json file and set necessary permissions
    !mkdir -p ~/.kaggle
    !cp kaggle.json ~/.kaggle/
    !chmod 600 /root/.kaggle/kaggle.json

Mounted at /content/gdrive


Saving kaggle.json to kaggle.json


In [4]:
# Use the kaggle CLI
!kaggle datasets list

ref                                                         title                                             size  lastUpdated          downloadCount  voteCount  usabilityRating  
----------------------------------------------------------  -----------------------------------------------  -----  -------------------  -------------  ---------  ---------------  
nelgiriyewithana/countries-of-the-world-2023                Global Country Information Dataset 2023           23KB  2023-07-08 20:37:33           9185        356  1.0              
juhibhojani/house-price                                     House Price                                        7MB  2023-08-02 16:51:21           1051         38  0.9411765        
arnavsmayan/netflix-userbase-dataset                        Netflix Userbase Dataset                          25KB  2023-07-04 07:38:41          10716        190  1.0              
alphiree/cardiovascular-diseases-risk-prediction-dataset    Cardiovascular Diseases Risk Predic

In [5]:
# Let us download the
!kaggle competitions download -c dogs-vs-cats-redux-kernels-edition

Downloading dogs-vs-cats-redux-kernels-edition.zip to /content
 99% 803M/814M [00:05<00:00, 118MB/s]
100% 814M/814M [00:07<00:00, 112MB/s]


In [6]:
!unzip dogs-vs-cats-redux-kernels-edition.zip

Archive:  dogs-vs-cats-redux-kernels-edition.zip
  inflating: sample_submission.csv   
  inflating: test.zip                
  inflating: train.zip               


## Organize the data

Before training, we need to store our [downloaded dataset](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition/download/train.zip) in the right folder structure. Remember to make the `data` directory where we will be performing the refactoring. We’ll divide the images into two sets – training and validation. Our directory structure will look something like this:

```
data
 |__train
 |    |__cat
 |    |__dog
 |__val
      |__cat
      |__dog
```

In Linux/Mac, the following lines of command can help achieve this directory structure:

In [7]:
!unzip train.zip
%mv train data
%cd data
%mkdir train val
%mkdir train/cat train/dog
%mkdir val/cat val/dog

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: train/dog.55.jpg        
  inflating: train/dog.550.jpg       
  inflating: train/dog.5500.jpg      
  inflating: train/dog.5501.jpg      
  inflating: train/dog.5502.jpg      
  inflating: train/dog.5503.jpg      
  inflating: train/dog.5504.jpg      
  inflating: train/dog.5505.jpg      
  inflating: train/dog.5506.jpg      
  inflating: train/dog.5507.jpg      
  inflating: train/dog.5508.jpg      
  inflating: train/dog.5509.jpg      
  inflating: train/dog.551.jpg       
  inflating: train/dog.5510.jpg      
  inflating: train/dog.5511.jpg      
  inflating: train/dog.5512.jpg      
  inflating: train/dog.5513.jpg      
  inflating: train/dog.5514.jpg      
  inflating: train/dog.5515.jpg      
  inflating: train/dog.5516.jpg      
  inflating: train/dog.5517.jpg      
  inflating: train/dog.5518.jpg      
  inflating: train/dog.5519.jpg      
  inflating: train/dog.552.jpg       
  inflating: train/dog.

The 25,000 files inside the data folder are prefixed with `cat` and `dog`. Now, move the files into their respective directories. To keep our initial experiment short, we’ll pick the first 250 files per class and place them in training, and the next 250 files and place them in the validation folders. You can increase/decrease this number anytime, to experiment with a trade-off between accuracy and speed.

Classification accuracy on previously unseen images (in the validation folder) is a good proxy for how the classifier would perform in the real world. Ideally, the more training images, the better the learning will be. And, the more validation images, the better our classifier would perform in the real-world.

In [8]:
%ls | grep cat | sort -R | head -250 | xargs -I {} mv {} train/cat/
%ls | grep dog | sort -R | head -250 | xargs -I {} mv {} train/dog/
%ls | grep cat | sort | head -500 | tail -250 | xargs -I {} mv {} val/cat/
%ls | grep dog | sort | head -500 | tail -250 | xargs -I {} mv {} val/dog/

## Set up the configuration

Let's start off with our Python program and begin with importing the necessary packages.

In [9]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (
    Input,
    Flatten,
    Dense,
    Dropout,
    GlobalAveragePooling2D,
)
from tensorflow.keras.applications.mobilenet import MobileNet, preprocess_input
import math

Let's place all the configurations up-front. These can be modified in the future based on the dataset of your choice.

In [10]:
TRAIN_DATA_DIR = "train/"
VALIDATION_DATA_DIR = "val/"
TRAIN_SAMPLES = 500
VALIDATION_SAMPLES = 500
NUM_CLASSES = 2
IMG_WIDTH, IMG_HEIGHT = 224, 224
BATCH_SIZE = 64

## Load and augment the data

Colored images usually have 3 channels viz. red, green and blue, each with intensity value ranging from 0 to 255. To normalize it (i.e. bring the value between 0 to 1), we can rescale the image by dividing each pixel by 255. Or, we can use the default `preprocess_input` function in Keras which does the preprocessing for us.

In [11]:
train_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
)
val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

Time to load the data from its directories and let the augmentation happen!

A few key things to note:

- Training one image at a time can be pretty inefficient, so we can batch them into groups.
- To introduce more randomness during the training process, we’ll keep shuffling the images in each batch.
- To bring reproducibility during multiple runs of the same program, we’ll give the random number generator a seed value.

In [12]:
train_generator = train_datagen.flow_from_directory(
    TRAIN_DATA_DIR,
    target_size=(IMG_WIDTH, IMG_HEIGHT),
    batch_size=BATCH_SIZE,
    shuffle=True,
    seed=12345,
    class_mode="categorical",
)
validation_generator = val_datagen.flow_from_directory(
    VALIDATION_DATA_DIR,
    target_size=(IMG_WIDTH, IMG_HEIGHT),
    batch_size=BATCH_SIZE,
    shuffle=False,
    class_mode="categorical",
)

Found 500 images belonging to 2 classes.
Found 500 images belonging to 2 classes.


Now that the data is taken care of, we come to the most crucial component of our training process - the model. We will reuse a CNN previously trained on the ImageNet dataset, remove the ImageNet specific classifier in the last few layers, and replace it with our own classifier suited to our problem. For transfer learning, we’ll ‘freeze’ the weights of the original model, i.e. set those layers as unmodifiable, so only the layers of the new classifier (that we add) can be modified. To keep things fast, we’ll choose the MobileNet model. Don’t worry about the specific layers, we’ll dig deeper into those details in [Chapter 4](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch04.html).

## Define the model

In [13]:
def model_maker():
    base_model = MobileNet(include_top=False, input_shape=(IMG_WIDTH, IMG_HEIGHT, 3))
    for layer in base_model.layers[:]:
        layer.trainable = False
    input = Input(shape=(IMG_WIDTH, IMG_HEIGHT, 3))
    custom_model = base_model(input)
    custom_model = GlobalAveragePooling2D()(custom_model)
    custom_model = Dense(64, activation="relu")(custom_model)
    custom_model = Dropout(0.5)(custom_model)
    predictions = Dense(NUM_CLASSES, activation="softmax")(custom_model)
    return Model(inputs=input, outputs=predictions)

## Train and test

With both the data and model ready, all we have left to do is train the model. This is also known as fitting the model to the data. For training any model, we need to pick a loss function, an optimizer, initial learning rate and a metric. Let's discuss these briefly:

- **Loss function**: The loss function is the objective being minimized. For example, in a task to predict house prices, the loss function could be the mean squared error.
- **Optimizer**: This is an optimization algorithm that helps minimize the loss function. We’ll choose `Adam`, one of the fastest optimizers out there.
- **Learning rate**: This defines how quickly or slowly you update the weights during training. Choosing an optimal learning rate is crucial - a big value can cause the training process to jump around, missing the target. On the other hand, a tiny value can cause the training process to take ages to reach the target. We’ll keep it at 0.001 for now.
- **Metric**: Choose a metric to judge the performance of the trained model. Accuracy is a good explainable metric, especially when the classes are not imbalanced, i.e. roughly equal in size. Note that this metric is not used during training to maximize or minimize an objective.

You might have noticed the term `epoch` here. One epoch means a full training step where the network has gone over the entire dataset.  One epoch may consist of several mini-batches.

Run this program and let the magic begin. If you don’t have a GPU, brew a cup of coffee while you wait. You’ll notice 4 statistics - `loss` and `accuracy` on both the training and validation data. You are rooting for the `val_acc`.

In [14]:
model = model_maker()
model.compile(
    loss="categorical_crossentropy",
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=["acc"],
)
model.fit(
    train_generator,
    steps_per_epoch=math.ceil(float(TRAIN_SAMPLES) / BATCH_SIZE),
    epochs=10,
    validation_data=validation_generator,
    validation_steps=math.ceil(float(VALIDATION_SAMPLES) / BATCH_SIZE),
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf_no_top.h5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7a34c00afe20>

On our runs, all it took was 5 seconds in the very first epoch to reach 90% accuracy on the validation set, with just 500 training images. Whoa! And by the 10th step, we observe about 97% validation accuracy. That’s the power of transfer learning.

Without having the model previously trained on ImageNet, getting a decent accuracy on this task would have taken (1) training time anywhere between a couple of hours to a few days (2) tons of more data to get decent results.

Before we forget, save the model you trained.

In [15]:
model_path = "model.h5"
if IS_COLAB_ENV:
    model_path = f"/content/gdrive/MyDrive/Practical-Deep-Learning-Book/code-outputs/chapter-3/{model_path}"
model.save(model_path)

## Model Prediction

Now that you have a trained model, you might eventually want to use it later for your application. We can now load this model anytime and classify an image. The Keras function `load_model`, as the name suggests loads the model.

In [16]:
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import numpy as np

model = load_model(model_path)

Now let’s try loading our original sample images and see what results we get.

In [17]:
def download_sample_image(filename):
    import requests

    url = f"https://raw.githubusercontent.com/PracticalDL/Practical-Deep-Learning-Book/master/sample-images/{filename}"
    open(filename, "wb").write(requests.get(url).content)

In [18]:
IMG_PATH = "../../../sample-images/dog.jpg"
if IS_COLAB_ENV:
    IMG_PATH = "dog.jpg"
    download_sample_image(IMG_PATH)

img = image.load_img(IMG_PATH, target_size=(224, 224))
img_array = image.img_to_array(img)
expanded_img_array = np.expand_dims(img_array, axis=0)
preprocessed_img = expanded_img_array / 255.0  # Preprocess the image
prediction = model.predict(preprocessed_img)
print(prediction)
print(validation_generator.class_indices)

[[0.00699834 0.99300164]]
{'cat': 0, 'dog': 1}


In [19]:
IMG_PATH = "../../../sample-images/cat.jpg"
if IS_COLAB_ENV:
    IMG_PATH = "cat.jpg"
    download_sample_image(IMG_PATH)

img = image.load_img(IMG_PATH, target_size=(224, 224))
img_array = image.img_to_array(img)
expanded_img_array = np.expand_dims(img_array, axis=0)
preprocessed_img = expanded_img_array / 255.0  # Preprocess the image
prediction = model.predict(preprocessed_img)
print(prediction)
print(validation_generator.class_indices)

[[9.9992204e-01 7.7964731e-05]]
{'cat': 0, 'dog': 1}
