# Training Custom Tensorflow models on MNIST dataset

This notebook demonstrates the process of training a custom CNN model on MNIST dataset using Tensorflow. 

To run this notebook, I am using the following package versions:

Tensorflow: 2.8.0

Tensorflow datasets: 4.6.0

There might be minor variations in the code for different versions of these packages.

This notebook has been created as supplentary material for the Series: Everything you need to know about CNNs: Quick Introduction to Model Training (Link: )

## Imports

In [24]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras import layers
from tensorflow.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from tensorflow.keras.models import Model, load_model
#from resnets_utils import *Keras
from tensorflow.keras.initializers import random_uniform, glorot_uniform, constant, identity
import matplotlib.pyplot as plt
from IPython import display

## Inputs and Parameters

Here we load the MNIST dataset using Tensorflow datasets and split it into train, validation and test datasets.

We'll also assign the batch size.

In [25]:
# Define the batch size to be used in this notebook
BATCH_SIZE=128

splits = ["train[:70%]", "train[70%:85%]", "train[85%:]"]

# load the dataset given the splits defined above
splits, info = tfds.load("mnist", with_info=True, as_supervised=True, split=splits)

(train_examples, validation_examples, test_examples) = splits

num_examples = info.splits["train"].num_examples
num_classes = info.features["label"].num_classes

Let's view some of the details of the dataset

In [26]:
num_classes

10

In [27]:
info

tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='~\\tensorflow_datasets\\mnist\\3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",
)

## Preprocessing

We'll need to do a little bit processing before we can use the data. These include,

* Type casting

* Normalizing the pixel values

* Reshaping it to icnlude the batch size (the extra 1, for RGB images there would be 4 channels in total to accomodate for the RGB


In [28]:
# Resize the image and normalize the pixel values
def format_image(image, label):
    """
       Function to resize the image and normalise the pixel_values
    """
    image = tf.cast(image, dtype=tf.float32)
    image = image / 255.0
    image = tf.reshape(image, shape=(28, 28, 1,))    
    return image, label

# Prepare training, test and validation batches
train_batches = train_examples.shuffle(num_examples // 4).map(format_image).batch(BATCH_SIZE).prefetch(tf.data.experimental.AUTOTUNE)
validation_batches = validation_examples.map(format_image).batch(BATCH_SIZE).prefetch(tf.data.experimental.AUTOTUNE)
test_batches = test_examples.map(format_image).batch(1)



Note: This is a simple example to demonstrate building models and training using Tensorflow's functional API and we are ignoring approaches to improve model performance like data augmentation etc.

In [29]:
train_batches

<PrefetchDataset element_spec=(TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int64, name=None))>

## Stack up the layers


Let's stack up the layers in the following order:

* Add a padding to the input

* Stage 1: Two back to back 2D convolutions with relu activation, with 3x3 filters and stride set to 1, followed by a single maxopooling with window size of 3x3 and stride of 2

* Stage 1: Two back to back 2D convolutions with relu activation, with 3x3 filters and stride set to 1, followed by a single maxopooling with window size of 3x3 and stride of 2

* Stage 3: Flattening followed a dense layer with ReLU activation and another dense layer with ReLU activation

In [30]:
def customCNN(input_shape = (28, 28, 1), classes = 10):
    """
    Implementation of VGG16 architecture:
  
    Arguments:
    input_shape -- shape of the images of the dataset
    classes -- integer, number of classes

    Returns:
    model -- a Model() instance in Tensorflow
    """
    
    # Define the input as a tensor with shape input_shape
    X_input = Input(input_shape)
    
    # Zero-Padding
    X = ZeroPadding2D((3, 3))(X_input)
    
    # Stage 1
    X = Conv2D(64, (3, 3), strides = (1, 1), kernel_initializer = glorot_uniform(seed=0))(X)
    X = Activation('relu')(X)
    X = Conv2D(64, (3, 3), strides = (1, 1), kernel_initializer = glorot_uniform(seed=0))(X)
    X = Activation('relu')(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)

    # Stage 2
    X = Conv2D(128, (3, 3), strides = (1, 1), kernel_initializer = glorot_uniform(seed=0))(X)
    X = Activation('relu')(X)
    X = Conv2D(128, (3, 3), strides = (1, 1), kernel_initializer = glorot_uniform(seed=0))(X)
    X = Activation('relu')(X)
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)
    
    # Stage 3 
    X = Flatten()(X)
    X = Dense(classes, activation='relu', kernel_initializer = glorot_uniform(seed=0))(X)
    X = Dense(classes, activation='softmax', kernel_initializer = glorot_uniform(seed=0))(X)

    model = Model(inputs = X_input, outputs = X)

    return model
    

In [31]:
#Initialize the model and set the number of classes
model = customCNN(input_shape = (28, 28, 1), classes = 10)
print(model.summary())

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 zero_padding2d_1 (ZeroPaddi  (None, 34, 34, 1)        0         
 ng2D)                                                           
                                                                 
 conv2d_4 (Conv2D)           (None, 32, 32, 64)        640       
                                                                 
 activation_4 (Activation)   (None, 32, 32, 64)        0         
                                                                 
 conv2d_5 (Conv2D)           (None, 30, 30, 64)        36928     
                                                                 
 activation_5 (Activation)   (None, 30, 30, 64)        0         
                                                           

## Train the model

Add your loss function, optimizer, metrics. Compile the model and your're ready to train your custom tensorflow model!

In [11]:
model.compile(loss="sparse_categorical_crossentropy",
                optimizer=tf.keras.optimizers.RMSprop(0.001)  ,
                metrics=["accuracy"])
EPOCHS = 3
model.fit(train_batches,
          epochs=EPOCHS,
          validation_data=validation_batches)


Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x25a6a3df880>