# Image classification from scratch

## Introduction

This example shows how to do image classification from scratch, starting from JPEG image files on disk, without leveraging pre-trained weights or a pre-made Keras Application model. We demonstrate the workflow on the Kaggle Cats vs Dogs binary classification dataset.

We use the <code>image_dataset_from_directory</code> utility to generate the datasets, and we use Keras image preprocessing layers for image standardization and data augmentation.

## Setup

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## Load the data: the Cats vs Dogs dataset

### Raw data download

First, let's download the 786M ZIP archive of the raw data.

In [None]:
#!curl -O https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip

In [None]:
#!unzip -q kagglecatsanddogs_3367a.zip
#!ls

In [None]:
!ls PetImages

### Filter out corrupted images

When working with lots of real-world image data, corrupted images are a common occurence. Let's filter out badly-encoded images that do not feature the string "JFIF" in their header.

In [None]:
import os
from os import listdir
from os.path import join

bad_img = 0
for folder_name in ("Cat", "Dog"):
    folder_path = join("PetImages", folder_name)
    print(f"Lenght {folder_name} = {len(listdir(folder_path))}")
    for fname in listdir(folder_path):
        fpath = join(folder_path, fname)
        try:
            # open in binary reading mode
            fobj = open(fpath, "rb")
            is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
        # finally close execute regardless of exceptions in the try statement
        finally:
            fobj.close()

        if not is_jfif:
            bad_img += 1
            # delete corrupted image
            os.remove(fpath)

print(f"Deleted images: {bad_img}")

## Generate Dataset

In [None]:
image_size = (180, 180)
batch_size = 32

# training dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "PetImages",
    validation_split=0.2,
    subset="training",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
)

# validation dataset
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "PetImages",
    validation_split=0.2,
    subset="validation",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
)

## Visualise the dataset

### import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(int(labels[i]))
        plt.axis("off")

## Using image data augmentation

When you don't have a large image dataset, it's a good practice to artificially introduce sample diversity by applying random yet realistic transformations to the training images, such as random horizontal flipping or small random rotations. This helps expose the model to different aspects of the training data while slowing down overfitting.

In [None]:
data_augmentation = keras.Sequential([layers.RandomFlip("horizontal"),layers.RandomRotation(0.1)])

Now let's visualise the result of the data agumentation applied to the first image of the set.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
    for i in range(9):
        augmented_images = data_augmentation(images)
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8"))
        plt.axis("off")

## Standardizing the data

Our image are already in a standard size (180x180), as they are being yielded as contiguous float32 batches by our dataset. However, their RGB channel values are in the [0, 255] range. This is not ideal for a neural network; in general you should seek to make your input values small. Here, we will standardize values to be in the [0, 1] by using a Rescaling layer at the start of our model.

## Two options to preprocess the data

There are two ways you could be using the data_augmentation preprocessor.

### Option 2 - Make it part of the model

In [None]:
inputs = keras.Input(shape=image_size)
x = data_augmentation(inputs)
# rescale to unitarian values
x = layers.Rescaling(1./255)(x)

With this option, your data augmentation will happen on device, synchronously with the rest of the model execution, meaning that it will benefit from GPU acceleration. Note that data augmentation is inactive at test time, so the input samples will only be augmented during <code>fit()</code>, not when calling <code>evaluate()</code> or <code>predict()</code>.

<b>If you're training on GPU, this is the better option.</b>

### Option 2 - Apply it to the dataset

This option allows to obtain a dataset that yields batches of augmented images

In [None]:
agumented_trained_ds = train_ds.map(lambda x, y: (data_augmentation(x, training=True), y))

In [None]:
plt.figure(figsize=(10, 10))
for images, _ in agumented_trained_ds.take(1):
    for i in range(9):
        augmented_images = data_augmentation(images)
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8"))
        plt.axis("off")

With this option, your data augmentation will happen on CPU, asynchronously, and will be buffered before going into the model. <b>If you're training on CPU, this is the better option</b>, since it makes data augmentation asynchronous and non-blocking.

For now let's chose the first option.

## Configure the dataset for performance

Let's make sure to use buffered prefetching so we can yield data from disk without having I/O becoming blocking.

In [None]:
train_ds = train_ds.prefetch(buffer_size=32)
val_ds = val_ds.prefetch(buffer_size=32)

## Build a model

We'll build a small version of the Xception network. We haven't particularly tried to optimize the architecture; if you want to do a systematic search for the best model configuration, consider using KerasTuner.

Note that:

- We start the model with the <code>data_augmentation</code> preprocessor, followed by a Rescaling layer.
- We include a Dropout layer before the final classification layer.

In [None]:
def make_model(input_shape, num_classes):
    inputs = keras.Input(shape=input_shape)
    # image augmentation block
    x = data_augmentation(inputs)

    # entry block
    x = layers.Rescaling(1.0 / 255)(x)
    x = layers.Conv2D(32, 3, strides=2, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    x = layers.Conv2D(64, 3, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    previous_block_activation = x  # set aside residual

    for size in [128, 256, 512, 728]:
        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(size, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(size, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

        # project residual
        residual = layers.Conv2D(size, 1, strides=2, padding="same")(previous_block_activation)
        x = layers.add([x, residual])  # add back residual
        previous_block_activation = x  # set aside next residual

    x = layers.SeparableConv2D(1024, 3, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    x = layers.GlobalAveragePooling2D()(x)
    if num_classes == 2:
        activation = "sigmoid"
        units = 1
    else:
        activation = "softmax"
        units = num_classes

    x = layers.Dropout(0.5)(x)
    outputs = layers.Dense(units, activation=activation)(x)
    return keras.Model(inputs, outputs)

In [None]:
# make model
model = make_model(input_shape=image_size + (3,), num_classes=2)

In [None]:
# plot model
keras.utils.plot_model(model, show_shapes=True)

## Train the model

In [None]:
epochs = 50

callbacks = [keras.callbacks.ModelCheckpoint("save_at_{epoch}.h5")]
model.compile(optimizer=keras.optimizers.Adam(1e-3), loss="binary_crossentropy", metrics=["accuracy"])
model.fit(train_ds, epochs=epochs, callbacks=callbacks, validation_data=val_ds)