We will start by taking a look at a simple convnet example that classifies the MNIST digits.
The following shows an example of a basic Convnet; a stack of Conv2D and MaxPooling2D layers.
And as we mostly do, we will use the functional API to build the model:

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

In [2]:
inputs = keras.Input(shape=(28, 28, 1))    # (shape=(image_height, image_width, image_channels)), not including the batch dim.

x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs) #filters=32 means the layer will learn 32 feature detectors like edges, shapes etc.
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)   # kernel size is the size of the filter window, 3 * 3
x = layers.MaxPooling2D(pool_size=2)(x)                            # pooling reduces the spatial size for better learning
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)

A convnet takes input tensors of shape(image_height, image_width, image_channel) without including the batch dimension. Here, we wukk configure the convnet to process inputs of size (28, 28, 1) — the format of the MNIST images. with 1 representing grayscale.

Lets display the architecture of our convent.

In [3]:
model.summary()

You can see each output of the Conv2D and Maxpooling layer is a rank-3 tensor, with the filter argument passed to the Conv2D layer controlling the number of channels.

After the last Conv2D layer, we ended up with (3, 3, 128) output shape. that is a 3 by 3 feature map with 128 channels. Then we feed this output layer into a densely connected classifer that processes 1D vectors. So for them to be compatible, we flatten them out to 1D before adding the dense layer.

Now lets train our convnet using the mnist dataset. we will use the sparse_categorical_crossentropy because our labels are integers

In [4]:
from tensorflow.keras.datasets import mnist

In [5]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255
model.compile(optimizer="rmsprop",
      loss="sparse_categorical_crossentropy",
      metrics=["accuracy"])

model.fit(train_images, train_labels, epochs=5, batch_size=64)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Epoch 1/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 59ms/step - accuracy: 0.8847 - loss: 0.3702
Epoch 2/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m54s[0m 57ms/step - accuracy: 0.9857 - loss: 0.0469
Epoch 3/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 57ms/step - accuracy: 0.9899 - loss: 0.0316
Epoch 4/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 57ms/step - accuracy: 0.9933 - loss: 0.0221
Epoch 5/5
[1m938/938[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 56ms/step - accuracy: 0.9948 - loss: 0.0168


<keras.src.callbacks.history.History at 0x7c9f516db260>

In [6]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc:.3f}")

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 12ms/step - accuracy: 0.9889 - loss: 0.0436
Test accuracy: 0.990


We can see we have an accuracy as high as 99.2%. This works better than the densely connnected model explored in earlier chapters. This is because of features like filters and Maxpooling — more details in the book.

## Training convnets from scratch on a small dataset

we will classify images as dogs and cats in a dataset containing 5000 pictures of cats and dogs(2500 cats, and 2500 pics of dogs).

We will first naively train 2000 images from scratch without regularization, to set a baseline for what can be achieved. Before then exploring data augmentation to imporve the model.

In the next section, we will explore *feature extraction with a pretrained model* and *fine tuning a pretrained model*, all of which will improve our model immensely.

Lets download the data set from kaggle. But doing that, I will need to authenticate myself on kaggel using the kaggle token. Lets do it :

In [8]:
import json

In [11]:
token = {
     'username': 'mainasaid93',
     'key': 'KGAT_d2c2edae0d4e484013aec1c00e95764c'
 }

with open("kaggle.json", "w") as t:         # creates a json file called kaggle.json, 'w' write mode
  json.dump(token, t)                # dumps the token (the dictionary above) into the file t. in a proper json format.

!mkdir ~/.kaggle              # creating a kaggle folder
!cp kaggle.json ~/.kaggle/   # coppying the key file to it.
!chmod 600 ~/.kaggle/kaggle.json  # making it only readable by the user, that is myself in this case.

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [12]:
!kaggle datasets list

ref                                                               title                                                     size  lastUpdated                 downloadCount  voteCount  usabilityRating  
----------------------------------------------------------------  --------------------------------------------------  ----------  --------------------------  -------------  ---------  ---------------  
wardabilal/spotify-global-music-dataset-20092025                  Spotify Global Music Dataset (2009–2025)               1289021  2025-11-11 09:43:05.933000           8556        191  1.0              
sadiajavedd/students-academic-performance-dataset                 Students_Academic_Performance_Dataset                     8907  2025-10-23 04:16:35.563000          13640        330  1.0              
prince7489/youtube-shorts-performance-dataset                     YouTube Shorts Performance Dataset                        6541  2025-11-25 09:23:36.147000            900         27  0.941176

This shows everything has worked. Let me now download the dataset needed for this model.

In [14]:
!kaggle competitions download -c dogs-vs-cats

401 Client Error: Unauthorized for url: https://www.kaggle.com/api/v1/competitions/data/download-all/dogs-vs-cats


The competition has officically ended so I cannot join, that is why the above code will not work. To work with the dataset for practice like i am doing, just download the datasets — done by only changing *competitions* with *datasets* in the code.

In [15]:
!kaggle datasets download -d tongpython/cat-and-dog

Dataset URL: https://www.kaggle.com/datasets/tongpython/cat-and-dog
License(s): CC0-1.0
Downloading cat-and-dog.zip to /content
 93% 203M/218M [00:00<00:00, 235MB/s]
100% 218M/218M [00:02<00:00, 80.1MB/s]
