### classify handwritten digits using CNN architecure
1. Dense layers learn global patterns in their input feature space - For a MNIST digit, patterns involving all pixels
2. Convolution layers learn local patterns - in the case of images, patterns found in small 2D windows of the inputs
3. The patterns CNN learn are translation invariant - After learning a certain pattern in the lower-right corner of a picture, a convnet can recognize it anywhere: for example, in the upper-left corner. 
4. A densely connected network would have to learn the pattern a new if it appeared at a new location.
5. CNN can learn spatial hierarchies of patterns - A first convolution layer will learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on.
6. padding = "valid" - no padding
7. padding = "same" - padding will be applied
8. distance between two successive windows is a parameter of the convolution - stride
9. convolution is typically done with 3 × 3 windows and no stride
10. max pooling is usually done with 2 × 2 windows and stride 2, in order to downsample the feature maps by a factor of 2

In [1]:
# imorting necessary libraries
from keras.datasets import mnist
from keras.utils import to_categorical
from keras import layers
from keras import models

In [2]:
# Loading the MNIST dataset in Keras and split data into train and test
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Preparing the image data as per network requirement
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

# Preparing the labels as per network requirement
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [3]:
#  network architecture
# Sequential class used only for linear stacks of layers
# functional API used for directed acyclic graphs of layers
model = models.Sequential()
# Convolution layer with 32 filters and 3X3 kernel
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
# Maxpooling layer 
model.add(layers.MaxPooling2D((2, 2)))
# Convolution layer with 64 filters and 3X3 kernel
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Maxpooling layer 
model.add(layers.MaxPooling2D((2, 2)))
# Convolution layer with 32 filters and 3X3 kernel
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Flattening layer
model.add(layers.Flatten())
# fully connected layer with 64 filters
model.add(layers.Dense(64, activation='relu'))
# fully connected layer with 10 filters
model.add(layers.Dense(10, activation='softmax'))
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 64)          36928     
                                                                 
 flatten (Flatten)           (None, 576)               0

In [4]:
# compilation step
# binary crossentropy for a two-class classification problem
# categorical crossentropy for a many-class classification problem
# meansquared error for a regression problem
# connectionist temporal classification for a sequence-learning problem
model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])

# training step
model.fit(train_images, train_labels, epochs=5, batch_size=64)

# evaluation step
test_loss, test_acc = model.evaluate(test_images, test_labels)
test_acc

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


0.9919999837875366