Convolutional Neural Networks. They are the best and most effective for all tasks 
where the order or arrangement of the data is of absolute importance. Computer 
Vision and Pattern Recognition falls into this category. Convolutional Neural 
Networks, henceforth to be called CNNs, were pioneered by Yann LeCun et al in 
1998.

CNNs are feed-forward networks, just like the vanilla neural networks, however, 
they are locally connected while the vanilla neural networks are fully connected.


CNNs work by detecting specific patterns or features across the entire image.

CNNs are incredibly effective at detecting patterns, hence, Deep Computer Vision 
rests heavily on their shoulders.

 The number of pixels by which the convolutions move is called 
the stride and can be more than one

we use padding to ensure the dimensions remain unaltered.

In [2]:
import keras
from keras.datasets import mnist
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Input
from keras.models import Model
from keras.optimizers import SGD
from keras.callbacks import LearningRateScheduler, ModelCheckpoint
import os

In [3]:
#load the mnist dataset
(train_x, train_y), (test_x, test_y) = mnist.load_data()

In [4]:
#normalize the data
train_x = train_x.astype('float32')/255
test_x = test_x.astype('float32')/255

In [5]:
train_x.shape

(60000, 28, 28)

In [6]:
test_x.shape

(10000, 28, 28)

In [8]:
test_y.shape

(10000,)

In [10]:
train_y.shape

(60000,)

In [11]:
#reshape from (28,28) to (28,28,1) for the x = features
train_x = train_x.reshape(train_x.shape[0], 28,28,1)
test_x = test_x.reshape(test_x.shape[0], 28,28,1)


In [12]:
train_x.shape, test_x.shape


((60000, 28, 28, 1), (10000, 28, 28, 1))

In [13]:
#Encode the labels which is the y = labels
train_y = keras.utils.to_categorical(train_y, 10)
test_y = keras.utils.to_categorical(test_y, 10)

In [22]:
train_y.shape, test_y.shape

((60000, 10), (10000, 10))

In [25]:
def MiniModel(input_shape):
  images = Input(input_shape)
  net = Conv2D(filters= 64, kernel_size=[3,3], strides= [1,1], padding= 'same', activation= 'relu')(images)
  net = Conv2D(filters= 64, kernel_size=[3,3], strides= [1,1], padding= 'same', activation= 'relu')(net)
  net = MaxPooling2D(pool_size= (2,2))(net) 
  net = Conv2D(filters= 128, kernel_size=[3,3], strides= [1,1], padding= 'same', activation= 'relu')(net)
  net = Conv2D(filters= 128, kernel_size=[3,3], strides= [1,1], padding= 'same', activation= 'relu')(net)
  net = Flatten()(net)
  net = Dense(units = 10, activation= 'softmax')(net)

  model = Model(inputs = images, outputs = net)

  return model

In [26]:
input_shape = (28,28,1)
model = MiniModel(input_shape)

In [27]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_4 (Conv2D)           (None, 28, 28, 64)        640       
                                                                 
 conv2d_5 (Conv2D)           (None, 28, 28, 64)        36928     
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 14, 14, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_6 (Conv2D)           (None, 14, 14, 128)       73856     
                                                                 
 conv2d_7 (Conv2D)           (None, 14, 14, 128)       147584    
                                                             

In [28]:
# define the function for the learning rate
def lr_schedule(epoch):
  lr = 0.1

  if epoch > 15:
    lr = lr/ 100

  elif epoch > 10:
    lr = lr/10

  elif epoch > 5:
    lr = lr/5

  print('Learning Rate: ', lr)

  return lr

In [29]:
#pass the learning rate scheduler to the learning rate class
lr_scheduler = LearningRateScheduler(lr_schedule) 

In [30]:
#directory in which to create models
save_direc = os.path.join(os.getcwd(), 'mnistsavedmodels_cnn')

#name of models file
model_name= 'mnistmodel_cnn.{epoch:03d}.h5'

#create directory if it doesnt exist
if not os.path.isdir(save_direc):
  os.makedirs(save_direc)

#join the directory with model path
modelpath = os.path.join(save_direc, model_name)

In [31]:
checkpoint = ModelCheckpoint(filepath= modelpath,
                             monitor= 'val_acc',
                             verbose= 1,
                             save_best_only= True,
                             period = 1)



In [33]:
model.compile(optimizer= SGD(lr_schedule(0)), loss= 'categorical_crossentropy', metrics = ['accuracy'])

Learning Rate:  0.1


In [37]:
model.fit(train_x, train_y, batch_size = 64, epochs = 20, validation_split= 0.1, verbose= 1, callbacks=[checkpoint, lr_scheduler])

Learning Rate:  0.1
Epoch 1/20
Learning Rate:  0.1
Epoch 2/20
Learning Rate:  0.1
Epoch 3/20
Learning Rate:  0.1
Epoch 4/20
Learning Rate:  0.1
Epoch 5/20
Learning Rate:  0.1
Epoch 6/20
Learning Rate:  0.02
Epoch 7/20
Learning Rate:  0.02
Epoch 8/20
Learning Rate:  0.02
Epoch 9/20
Learning Rate:  0.02
Epoch 10/20
Learning Rate:  0.02
Epoch 11/20
Learning Rate:  0.01
Epoch 12/20
Learning Rate:  0.01
Epoch 13/20
Learning Rate:  0.01
Epoch 14/20
Learning Rate:  0.01
Epoch 15/20
Learning Rate:  0.01
Epoch 16/20
Learning Rate:  0.001
Epoch 17/20
Learning Rate:  0.001
Epoch 18/20
Learning Rate:  0.001
Epoch 19/20
Learning Rate:  0.001
Epoch 20/20


<keras.src.callbacks.History at 0x1e9a85b7f90>

In [38]:
#Evaluate the accuracy of the test dataset
accuracy = model.evaluate(x=test_x,y=test_y,batch_size=64)

print("Accuracy: ",accuracy[1])

Accuracy:  0.9908999800682068


In [39]:
model.save('mnist_cnn.h5')

  saving_api.save_model(
