<a href="https://colab.research.google.com/github/Aboubacar2012/Deep-Learning-Training/blob/main/Project_CNN_Handwritten_Digit_Recognition_Simple_Convolutional_Neural_Network_for_MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Simple Convolutional Neural Network for MNIST

Now that we have seen how to load the MNIST dataset and train a simple Multilayer Perceptron
model on it, it is time to develop a more sophisticated convolutional neural network or CNN
model. Keras does provide a lot of capability for creating convolutional neural networks. In this
section we will create a simple CNN for MNIST that demonstrates how to use all of the aspects
of a modern CNN implementation, including Convolutional layers, Pooling layers and Dropout
layers. The first step is to import the classes and functions needed.

In [1]:
# Import classes and functions
import numpy 
from keras.datasets import mnist 
from keras.models import Sequential 
from keras.layers import Dense 
from keras.layers import Dropout 
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_data_format('channels_first')

In [2]:
# fix random for reproducibility
seed=7
numpy.random.seed(seed)

Next we need to load the MNIST dataset and reshape it so that it is suitable for use training
a CNN. In Keras, the layers used for two-dimensional convolutions expect pixel values with the
dimensions [channels][width][height]. In the case of RGB, the first dimension channels
would be 3 for the red, green and blue components and it would be like having 3 image inputs
for every color image. In the case of MNIST where the channels values are gray scale, the pixel
dimension is set to 1.

In [3]:
# Load data 
(X_train, y_train),(X_test, y_test)=mnist.load_data()
# Reshape to be [sample][channels][width][height]
X_train=X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test=X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [4]:
#Normalize input from 0-255 to 0-1
X_train=X_train/255
X_test=X_test/255

#One hot encode outputs 
y_train=np_utils.to_categorical(y_train)
y_test=np_utils.to_categorical(y_test)
num_classes=y_test.shape[1]

Next we define our neural network model. Convolution neural networks are more complex than standard Mutilayer Perceptron, so will start by using a simple structure to begin with that uses all of the elements for state-of-the-art results. Below summarizes the network architecture.

- The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 feature maps, which with the size of 5x5 and a rectifier activation function. This is the input layer, expecting images with the structure outline above 

- Next we define a pooling layer that takes the maximun value called MaxPooling2D. It is configured with a pool size of 2x2.

- The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.

- Next is a layer that converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.

- Next a fully connected layer with 128 neurons and rectifier activation function is used.

- Finally, the ouput layer has 10 neurons for the 10 classes and a softmax activation function to output probability-like prediction for each class.

As before, the model is trained using logarithm loss and the ADAM gradient descent algorithm.

In [5]:
# Define and Compile CNN Model
def baseline_model():
  # create model
  model = Sequential()
  model.add(Convolution2D(32, 5, 5, input_shape=(1, 28, 28), activation='relu'))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Dropout(0.2))
  model.add(Flatten())
  model.add(Dense(128, activation='relu'))
  model.add(Dense(num_classes, activation='softmax'))
  # Compile model
  model.compile(loss='categorical_crossentropy' , optimizer='adam' , metrics=['accuracy'])
  return model

In [6]:
#Fit and Evaluate The CNN Model
model=baseline_model()
# Fit the model 
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model 
scores=model.evaluate(X_test, y_test, verbose=0)
print("CNN Error: %.2f%%" % (100-scores[1]*100))

Epoch 1/10
300/300 - 14s - loss: 1.1663 - accuracy: 0.6277 - val_loss: 0.5820 - val_accuracy: 0.8306 - 14s/epoch - 48ms/step
Epoch 2/10
300/300 - 1s - loss: 0.6321 - accuracy: 0.8008 - val_loss: 0.4444 - val_accuracy: 0.8659 - 1s/epoch - 4ms/step
Epoch 3/10
300/300 - 1s - loss: 0.5298 - accuracy: 0.8320 - val_loss: 0.3731 - val_accuracy: 0.8893 - 1s/epoch - 4ms/step
Epoch 4/10
300/300 - 1s - loss: 0.4737 - accuracy: 0.8499 - val_loss: 0.3418 - val_accuracy: 0.8972 - 1s/epoch - 4ms/step
Epoch 5/10
300/300 - 1s - loss: 0.4397 - accuracy: 0.8618 - val_loss: 0.3084 - val_accuracy: 0.9081 - 1s/epoch - 4ms/step
Epoch 6/10
300/300 - 1s - loss: 0.4108 - accuracy: 0.8699 - val_loss: 0.2888 - val_accuracy: 0.9110 - 1s/epoch - 4ms/step
Epoch 7/10
300/300 - 1s - loss: 0.3889 - accuracy: 0.8764 - val_loss: 0.2745 - val_accuracy: 0.9173 - 1s/epoch - 4ms/step
Epoch 8/10
300/300 - 1s - loss: 0.3686 - accuracy: 0.8827 - val_loss: 0.2613 - val_accuracy: 0.9195 - 1s/epoch - 4ms/step
Epoch 9/10
300/300 - 