We will start by taking a look at a simple convnet example that classifies the MNIST digits.
The following shows an example of a basic Convnet; a stack of Conv2D and MaxPooling2D layers.
And as we mostly do, we will use the functional API to build the model:

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

In [2]:
inputs = keras.Input(shape=(28, 28, 1))    # (shape=(image_height, image_width, image_channels)), not including the batch dim.

x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs) #filters=32 means the layer will learn 32 feature detectors like edges, shapes etc.
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)   # kernel size is the size of the filter window, 3 * 3
x = layers.MaxPooling2D(pool_size=2)(x)                            # pooling reduces the spatial size for better learning
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)

A convnet takes input tensors of shape(image_height, image_width, image_channel) without including the batch dimension. Here, we wukk configure the convnet to process inputs of size (28, 28, 1) — the format of the MNIST images. with 1 representing grayscale.

Lets display the architecture of our convent.

In [3]:
model.summary()

You can see each output of the Conv2D and Maxpooling layer is a rank-3 tensor, with the filter argument passed to the Conv2D layer controlling the number of channels.

After the last Conv2D layer, we ended up with (3, 3, 128) output shape. that is a 3 by 3 feature map with 128 channels. Then we feed this output layer into a densely connected classifer that processes 1D vectors. So for them to be compatible, we flatten them out to 1D before adding the dense layer.

Now lets train our convnet using the mnist dataset. we will use the sparse_categorical_crossentropy because our labels are integers

In [4]:
from tensorflow.keras.datasets import mnist

In [5]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255
model.compile(optimizer="rmsprop",
      loss="sparse_categorical_crossentropy",
      metrics=["accuracy"])

model.fit(train_images, train_labels, epochs=5, batch_size=64)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step
Epoch 1/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m50s[0m 52ms/step - accuracy: 0.8921 - loss: 0.3548
Epoch 2/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 52ms/step - accuracy: 0.9851 - loss: 0.0489
Epoch 3/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 50ms/step - accuracy: 0.9905 - loss: 0.0327
Epoch 4/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 52ms/step - accuracy: 0.9928 - loss: 0.0226
Epoch 5/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 50ms/step - accuracy: 0.9951 - loss: 0.0167


<keras.src.callbacks.history.History at 0x78313eaef2f0>

In [6]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc:.3f}")

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - accuracy: 0.9888 - loss: 0.0337
Test accuracy: 0.992


We can see we have an accuracy as high as 99.2%. This works better than the densely connnected model explored in earlier chapters. This is because of features like filters and Maxpooling — more details in the book.

## Training convnets from scratch on a small dataset

we will classify images as dogs and cats in a dataset containing 5000 pictures of cats and dogs(2500 cats, and 2500 pics of dogs).

We will first naively train 2000 images from scratch without regularization, to set a baseline for what can be achieved. Before then exploring data augmentation to imporve the model.

In the next section, we will explore *feature extraction with a pretrained model* and *fine tuning a pretrained model*, all of which will improve our model immensely.

Lets download the data set from kaggle. But doing that, I will need to authenticate myself on kaggel using the kaggle token. Lets do it :

In [7]:
import json

In [8]:
token = {
     'username': 'mainasaid93',
     'key': 'KGAT_d2c2edae0d4e484013aec1c00e95764c'
 }

with open("kaggle.json", "w") as t:         # creates a json file called kaggle.json, 'w' write mode
  json.dump(token, t)                # dumps the token (the dictionary above) into the file t. in a proper json format.

!mkdir ~/.kaggle              # creating a kaggle folder
!cp kaggle.json ~/.kaggle/   # coppying the key file to it.
!chmod 600 ~/.kaggle/kaggle.json  # making it only readable by the user, that is myself in this case.

In [9]:
!kaggle datasets list

ref                                                           title                                                     size  lastUpdated                 downloadCount  voteCount  usabilityRating  
------------------------------------------------------------  --------------------------------------------------  ----------  --------------------------  -------------  ---------  ---------------  
rohiteng/amazon-sales-dataset                                 Amazon Sales Dataset                                   4037578  2025-11-23 14:29:37.973000           2916         37  1.0              
wardabilal/spotify-global-music-dataset-20092025              Spotify Global Music Dataset (2009–2025)               1289021  2025-11-11 09:43:05.933000          10182        231  1.0              
khushikyad001/ai-impact-on-jobs-2030                          AI Impact on Jobs 2030                                   87410  2025-11-09 17:58:05.410000           5856        132  1.0              
mayabennet

This shows everything has worked. Let me now download the dataset needed for this model.

In [10]:
!kaggle competitions download -c dogs-vs-cats

401 Client Error: Unauthorized for url: https://www.kaggle.com/api/v1/competitions/data/download-all/dogs-vs-cats


The competition has officically ended so I cannot join, that is why the above code will not work. To work with the dataset for practice like i am doing, just download the datasets — done by only changing *competitions* with *datasets* in the code.

In [11]:
!kaggle datasets download -d tongpython/cat-and-dog

Dataset URL: https://www.kaggle.com/datasets/tongpython/cat-and-dog
License(s): CC0-1.0
Downloading cat-and-dog.zip to /content
 61% 133M/218M [00:00<00:00, 1.39GB/s]
100% 218M/218M [00:00<00:00, 821MB/s] 


In [12]:
! unzip -qq cat-and-dog.zip -d cat-and-dog  # unzips it to a folder named cat-and-dog

Now instead of downloading the data anytime i want to continue with it on colab, I will download it on my system and upload it on google drive. this way i have it readily available for use. lets do it.

In [13]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [14]:
import os, shutil, pathlib, zipfile

I want to make my folder structure same with the one in the book.by doing so, I will also practice how this libararies are used. Remember i could have done it while creating the zip file, I am doing more (not necessary) to practice working with the os, pathlib, and shutil libarary.

### creating the book style


In [15]:
old_path = "/content/drive/MyDrive/cats_vs_dogs_small"
new_path = "/content/drive/MyDrive/cats_vs_dogs_original"

os.rename(old_path, new_path)

print("Folder renamed successfully!")
print(new_path)

Folder renamed successfully!
/content/drive/MyDrive/cats_vs_dogs_original


In [16]:
new_small = "/content/drive/MyDrive/cats_vs_dogs_small"
os.makedirs(new_small, exist_ok=True)

Now I have it like it is in the book. so we continue with the codes in the book.

Recap: I first downloaded the the raw data, extracted it and move it into the the folder. i did some more steps to work with the pathlib and os just for practice.

In [17]:
os.makedirs("/content/drive/MyDrive/train", exist_ok=True)

In [18]:
print(  )




Now lets copy the data into the new train folder:

In [19]:
# cats_src = "/content/drive/MyDrive/cats_vs_dogs_original/training_set/training_set/cats"
# dogs_src = "/content/drive/MyDrive/cats_vs_dogs_original/training_set/training_set/dogs"

# train_dir = "/content/drive/MyDrive/train"

# # Copy cat images
# for i, fname in enumerate(sorted(os.listdir(cats_src))):
#     if fname.lower().endswith(".jpg"):
#         shutil.copyfile(
#             os.path.join(cats_src, fname),
#             os.path.join(train_dir, f"cat.{i}.jpg")
#         )

# # Copy dog images
# for i, fname in enumerate(sorted(os.listdir(dogs_src))):
#     if fname.lower().endswith(".jpg"):
#         shutil.copyfile(
#             os.path.join(dogs_src, fname),
#             os.path.join(train_dir, f"dog.{i}.jpg")
#         )

In [20]:
#original_directory = pathlib.Path("cat-and-dog/training_set/training_set") # path to the direcotry where our origin dataset was stored
#new_base_dir = pathlib.Path("cats_and_dogs_small")  # directory where our small dataset will be stored.

In [21]:
files = os.listdir("/content/drive/MyDrive/train")
print(len(files))

8005


Lets me inspect the content of our dataset.

In [22]:
print(files[:20])

['dog.3006.jpg', 'dog.3007.jpg', 'dog.3008.jpg', 'dog.3009.jpg', 'dog.3010.jpg', 'dog.3011.jpg', 'dog.3012.jpg', 'dog.3013.jpg', 'dog.3014.jpg', 'dog.3015.jpg', 'dog.3016.jpg', 'dog.3017.jpg', 'dog.3018.jpg', 'dog.3019.jpg', 'dog.3020.jpg', 'dog.3021.jpg', 'dog.3022.jpg', 'dog.3023.jpg', 'dog.3024.jpg', 'dog.3025.jpg']


In [23]:
print(files[-20:])

['cat.991.jpg', 'cat.992.jpg', 'cat.993.jpg', 'cat.994.jpg', 'cat.995.jpg', 'cat.996.jpg', 'cat.997.jpg', 'cat.998.jpg', 'cat.999.jpg', 'cat.1000.jpg', 'cat.1001.jpg', 'cat.1002.jpg', 'cat.1003.jpg', 'cat.1004.jpg', 'cat.1005.jpg', 'cat.1.jpg', 'cat.2.jpg', 'cat.3.jpg', 'cat.4.jpg', 'cat.5.jpg']


In [24]:
cats_count = 0
dogs_counts = 0
for i in files:
  if i.startswith("cat"):
    cats_count += 1
  elif i.startswith("dog"):
    dogs_counts += 1

print(f"cats: {cats_count}")
print(f"dogs: {dogs_counts}")

cats: 4000
dogs: 4005


Or a more cleaner/pythonic approach:

In [25]:
cats_count = [cats for cats in files if cats.startswith("cat")]
dogs_counts = [dogs for dogs in files if dogs.startswith("dog")]

print(f"length of cat pictures: {len(cats_count)}")
print(f"length of dog pictures: {len(dogs_counts)}")

length of cat pictures: 4000
length of dog pictures: 4005


So we have 4000 cat pictures and 4005 dog pictures. that is almost split in halve. remember we are not going to use all of the images. Just 5000 for training, validation, and testing out of the 8005 total we have.

copying images to training, validation, and test directories:

In [26]:
original_dir = pathlib.Path("/content/drive/MyDrive/train")  #path to the directory where the original data was uncompressed.
new_base_dir = pathlib.Path("/content/drive/MyDrive/cats_vs_dogs_small") # directory where we will store our smaller dataset.

def make_subset(subset_name, start_index, end_index):
  for category in ("cat", "dog"):
    dir = new_base_dir / subset_name / category
    os.makedirs(dir, exist_ok=True)

    all_files = sorted([f for f in os.listdir(original_dir) if f.startswith(category)])
    selected_files = all_files[start_index:end_index]

    for fname in selected_files:
      shutil.copyfile(src=original_dir / fname,
                      dst=dir / fname)

make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2000)

In [27]:
print(original_dir)
print(new_base_dir)

/content/drive/MyDrive/train
/content/drive/MyDrive/cats_vs_dogs_small


I have made some changes in my code to suit my directories on google drive. Also, if you check the make_subset function, you will see that my indexing method is different from the book.This is because the book expects the data numbering to be serial with increament of 1. while the one i uploaded is not. so i used alphabetical order instead.

Now lets build our model. The rescaling() layer will will rescale the images (whose value are originally in the [0,2555] range) to the [0,1] range.

In [28]:
from tensorflow import keras
from tensorflow.keras import layers

In [29]:
inputs = keras.Input(shape=(180, 180, 3))
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

Lets see how the dimensions of the feature maps change with every successive layer.

In [30]:
model.summary()

Now lets compile our model:

In [31]:
model.compile(loss = "binary_crossentropy",
              optimizer = "rmsprop",
              metrics = ["accuracy"])

### Data preprocessing