# Training a *convnet* from Scratch on a Small Dataset

Having to train an image-classification model using very little data is a common situation, which you’ll likely encounter in practice if you ever do computer vision in a professional context. In this section, we’ll review one basic strategy to tackle this problem: training a new model from scratch using what little data you have. We’ll start by naively training a small *convnet* without any regularization. At that point, the main issue will be overfitting. Then we’ll introduce *data augmentation*, a powerful technique for mitigating overfitting in computer vision.

Once the data we are going to use is downloaded from `https://www.microsoft.com/en-us/download/details.aspx?id=54765`, we are going to choose a small part of it to train our *convnet*.

In [None]:
import kaggle

In [1]:
# Copying images to training, validation, and test directories
import os, shutil, pathlib

original_dir = pathlib.Path("/mnt/0A2AAC152AABFBB7/sideProjects/data/kagglecatsanddogs_5340/PetImages")
new_base_dir = pathlib.Path("/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small")

def make_subset(subset_name, start_index, end_index):
    for category in ("Cat", "Dog"):
        dir = os.path.join(new_base_dir, subset_name, category)
        print(dir)
        if not os.path.exists(dir):
            os.makedirs(dir)
        fnames = [f"{i}.jpg" for i in range(start_index, end_index)]
        for fname in fnames:
            shutil.copyfile(src=os.path.join(original_dir, category, fname), 
                            dst=os.path.join(dir, fname))

make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2500)

/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small/train/Cat
/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small/train/Dog
/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small/validation/Cat
/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small/validation/Dog
/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small/test/Cat
/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/data/cats_vs_dogs_small/test/Dog


We will reuse the same general model structure you saw in the first example: the convnet will be a stack of alternated `Conv2D` (with `relu` activation) and `MaxPooling2D` layers. But because we’re dealing with bigger images and a more complex problem, we’ll make our model larger, accordingly: it will have two more `Conv2D` and `MaxPooling2D` stages. This serves both to augment the capacity of the model and to further reduce the size of the feature maps so they aren’t overly large when we reach the `Flatten` layer.

Because we’re looking at a binary-classification problem, we’ll end the model with a single unit (a `Dense` layer of size 1) and a `sigmoid` activation. This unit will encode the probability that the model is looking at one class or the other. One last small difference: we will start the model with a `Rescaling` layer, which will rescale image inputs (whose values are originally in the [0, 255] range) to the [0, 1] range.

In [2]:
# Instantiating a small convnet for dogs vs. cats classification
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(180, 180, 3))
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)

outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.summary()

2024-05-10 15:11:40.055884: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-10 15:11:40.059822: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-10 15:11:40.107512: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


For the compilation step, we’ll go with the `RMSprop` optimizer, as usual. Because we ended the model with a single sigmoid unit, we’ll use binary crossentropy as the loss function.

In [3]:
# Configuring the model for training
model.compile(
    loss     ="binary_crossentropy",
    optimizer="rmsprop",
    metrics  =["accuracy"]
)

As you know by now, data should be formatted into appropriately preprocessed floating-point tensors before being fed into the model. Currently, the data sits on a drive as `JPEG` files, so the steps for getting it into the model are roughly as follows:
1. Read the picture files;
2. Decode the JPEG content to RGB grids of pixels;
3. Convert these into floating-point tensors;
4. Resize them to a shared size (we’ll use 180 × 180);
5. Pack them into batches (we’ll use batches of 32 images).

Keras has utilities to take care of these steps automatically. In particular, Keras features the utility function `image_dataset_from_directory()`, which lets you quickly set up a data pipeline that can automatically turn image files on disk into batches of preprocessed tensors.

In [7]:
# Using image_dataset_from_directory to read images

from tensorflow.keras.utils import image_dataset_from_directory

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

Found 2000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 2000 files belonging to 2 classes.


In [8]:
# Fitting the model using a Dataset

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath      ="/mnt/0A2AAC152AABFBB7/sideProjects/introCompVision/savedModels/convnet_from_scratch.keras",
        save_best_only=True,
        monitor       ="val_loss"
    )
]
history = model.fit(
    train_dataset,
    epochs         =30,
    validation_data=validation_dataset,
    callbacks      =callbacks
)

Epoch 1/30
[1m26/63[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m27s[0m 731ms/step - accuracy: 0.5099 - loss: 0.6980

2024-05-10 15:15:01.769737: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: INVALID_ARGUMENT: Input is empty.
	 [[{{node decode_image/DecodeImage}}]]
	 [[IteratorGetNext]]


InvalidArgumentError: Graph execution error:

Detected at node decode_image/DecodeImage defined at (most recent call last):
<stack traces unavailable>
Input is empty.
	 [[{{node decode_image/DecodeImage}}]]
	 [[IteratorGetNext]] [Op:__inference_one_step_on_iterator_1705]

In [None]:
type(train_dataset)
for data_batch, labels_batch in train_dataset:
    print("data batch shape:", data_batch.shape)
    print("labels batch shape:", labels_batch.shape)

    break

data batch shape: (32, 180, 180, 3)
labels batch shape: (32,)
