1. First convolutional layer: consists of 16 5×5 filters (layer size = 28×28×16=12544)
2. First pooling layer: reduces the image by a factor of 2 in all directions (layer size = 14×14×16=3136)
3. Second convolutional layer: consists of 32 5×5 filters (layer size = 14×14×32=6272)
4. Second pooling layer: reduces the image by a factor of 2 in all directions (layer size = 7×7×32=1568)
5. Dense layer: fully-connected layer of 128 nodes
6. Output layer: 10 neurons corresponding to the 10 classes (digits from 0-9)

In [1]:
import tensorflow as tf

In [2]:
model = tf.keras.models.Sequential([
    # first convolutional layer
    tf.keras.layers.Conv2D( # layer for 2-dimensional image
        filters=16,
        kernel_size=5,
        padding="same", # what to put on the edges
        activation=tf.nn.relu
    ),
    # first pooling layer
    tf.keras.layers.MaxPool2D((2, 2), (2, 2), padding="same"),
    # second convolutional layer
    tf.keras.layers.Conv2D(
        filters=32,
        kernel_size=5,
        padding="same",
        activation=tf.nn.relu
    ),
    # second pooling layer
    tf.keras.layers.MaxPool2D((2, 2), (2, 2), padding="same"),
    # flatten layer into a linear set of nodes
    tf.keras.layers.Flatten(),
    # add a fully connected layer of 128 nodes
    tf.keras.layers.Dense(128, activation="relu"),
    # use drop-out regularization to randomly ignore 40% of the nodes each training cycle
    tf.keras.layers.Dropout(0.4),
    # output layer
    tf.keras.layers.Dense(10, activation="softmax")
])

We will train the model using sparse categorical cross-entropy for the loss function.

In [3]:
model.compile(
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

We load the data and split it into training and test sets.

In [4]:
import tensorflow as tf
import tensorflow_datasets as tfds

ds_train, ds_test = tfds.load(
    "mnist",
    split=["train", "test"],
    as_supervised=True,
)

We need to convert our data from the range 0-255 to 0-1. Then we can shuffle the data and put it into batches of 128.

In [5]:
def normalize_img(image, label):
    return tf.cast(image, tf.float32) / 255., label

ds_train = ds_train.map(normalize_img)
ds_train = ds_train.shuffle(1000).batch(128)

ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)

Now we can fit the model to the data.

In [6]:
model.fit(
    ds_train,
    validation_data=ds_test,
    epochs=2,
)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x7cddfea29c60>

We can now test the model on some additional real-world images.

In [8]:
from urllib.request import urlretrieve

for i in list(range(1,10)) + ["dog"]:
    urlretrieve(f"https://github.com/milliams/intro_deep_learning/raw/master/{i}.png", f"{i}.png")

import numpy as np
from skimage.io import imread

images = []
for i in list(range(1,10)) + ["dog"]:
    images.append(np.array(imread(f"{i}.png")/255.0, dtype="float32"))
images = np.array(images)[:,:,:,np.newaxis]
images.shape

(10, 28, 28, 1)

We have 10 images of 28x28 pixels apiece, with 1 color channel (black and white).

We can apply the model to these images to make predictions.

In [10]:
probabilities = model.predict(images)

truths = list(range(1, 10)) + ["dog"]

table = []
for truth, probs in zip(truths, probabilities):
    prediction = probs.argmax()
    if truth == 'dog':
        print(f"{truth}. CNN thinks it's a {prediction} ({probs[prediction]*100:.1f}%)")
    else:
        print(f"{truth} at {probs[truth]*100:4.1f}%. CNN thinks it's a {prediction} ({probs[prediction]*100:4.1f}%)")
    table.append((truth, probs))

1 at 13.0%. CNN thinks it's a 8 (38.9%)
2 at 48.7%. CNN thinks it's a 2 (48.7%)
3 at  5.2%. CNN thinks it's a 0 (54.2%)
4 at 95.1%. CNN thinks it's a 4 (95.1%)
5 at 99.0%. CNN thinks it's a 5 (99.0%)
6 at  3.0%. CNN thinks it's a 3 (34.8%)
7 at 45.4%. CNN thinks it's a 7 (45.4%)
8 at 29.1%. CNN thinks it's a 8 (29.1%)
9 at  2.9%. CNN thinks it's a 0 (40.1%)
dog. CNN thinks it's a 8 (56.9%)
