# CNN
The first thing we need to know is the elements that are included in the CNN operation:



*   Input image
*   Convolutional Neural Network


*   Output label (image class)








# Importing the libraries

In [1]:
#importing necessary libraries
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt #viusalization library
import random # used to generate  random numbers

from keras.datasets import mnist #importing the dataset
from keras.models import Sequential #Sequential groups a linear stack of layers into a tf.keras.Model.


from keras.layers.core import Dense, Dropout, Activation # importing the core layers 
from keras.utils import np_utils #from the utils module importing numpy utils

**Random** - Python Random module is an in-built module of Python which is used to generate random. 

**Keras** : Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.

From the keras library we call upon the datasets class and from there we import mnist dataset.
From the keras models module we import sequential class. 
The Sequential model, which is very straightforward (a simple list of layers), but is limited to single-input, single-output stacks of layers.

**Layers**: Layers are the basic building blocks of neural networks in Keras.

From layers api we import te dense, activatin and dropout classes from the core layers module.



# Data Preprocessing

Importing Preorocessing libraries

In [2]:
from keras.preprocessing.image import ImageDataGenerator #from keras image data preprocessing library importing imageDataGenerator to generate images
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D,GlobalAveragePooling2D,Flatten 
from keras.layers.normalization.batch_normalization import BatchNormalization

**Conv2D**: We find this under the layers module .2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of output.

**MaxPooling2D**: this is used during the max pooling step.

**ZeroPadding2D**: this layer can add rows and columns of zeros at the top, bottom, left and right side of an image tensor.

**Flatten & Global Average Pooling**

**Flatten** will take a tensor of any shape and transform it into a one dimensional tensor (plus the samples dimension) but keeping all values in the tensor. For example a tensor (samples, 10, 20, 1) will be flattened to (samples, 10 * 20 * 1). it converts to 1D array that is then fed to the ANN

**GlobalAveragePooling2D** does something different. This function is used to operate global average pooling for given data.
For example, suppose we have an input feature map of dimensions height(h), width, and depth. When we pass this input layer into the global average pooling operation then it will calculate the average value of every single map and returns the average value to the output node.

 **Batch Normalization** :Layer that normalizes its inputs.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

## Loading the Dataset 

Splitting into training and test set

In [3]:
(X_train,y_train),(X_test,y_test)=mnist.load_data() #splitting into training and test set

#Checking the shape of the data.
print("X_train_shape", X_train.shape)
print("X_test_shape", X_test.shape)
print("y_train_shape", y_train.shape)
print("y_test_shape", y_test.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
X_train_shape (60000, 28, 28)
X_test_shape (10000, 28, 28)
y_train_shape (60000,)
y_test_shape (10000,)


## Feature Scaling

In [4]:
X_train=X_train.reshape(60000,28,28,1) #reshaping the data, adding 1 to specify that our image is a black and white image 
X_test=X_test.reshape(10000,28,28,1)

X_train=X_train.astype('float32') #converting the data into a float type
X_test=X_test.astype('float32')

X_train /=255 # since each pixel takes a valiue between 0 and 255 so dividing by 255 we are features scaling
X_test /= 255

## Encoding the catgorical columns in the dataset

In [5]:
nb_classes=10 #we have 10 classes in our data

Y_train=np_utils.to_categorical(y_train,nb_classes)  # to_categorical class: Converts a class vector (integers) to binary class matrix.
Y_test=np_utils.to_categorical(y_test,nb_classes)


# Building the CNN

## Initialising the CNN

In [6]:
model= Sequential() #buiding an instance model which is a sequence of layers

## Step 1: Convolution

Adding the first convolution layer

In [7]:
#Convolution Layer 1
model.add(Conv2D(32,(3,3),input_shape=(28,28,1)))

**add method** is called on model object to add our very first layer which will be an object of Conv2D class.
from the keras library we call layers module from where we call the Conv2D class
the filters parameter is for how many feature detectors we want and the kernel size specifies the no of rows and columns of the feature map(3x3)
so  32 is the no of feature detectors and the size of those are 3x3.

**input_shape**: the input_shape specifies the shape.Since here we are working with black-white images and we resized our images earlier to (28,28)
 so we specify the input_shape as (28,28,1) where 1 specifies that its a black & white image  . For colored we write 3 instead of 1  because of RGB

Normalizaion

In [8]:
model.add(BatchNormalization(axis=-1))

It is the layer that normalizes its inputs.
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

## Adding the  activation

In [9]:
convLayer01 = Activation('relu')  #adding the rectifier activation function

Activations can either be used through an Activation layer, or through the activation argument supported by all forward layers.
**relu**:Applies the rectified linear unit activation function.

With default values, this returns the standard ReLU activation: max(x, 0), the element-wise maximum of 0 and the input tensor.

## Adding the 2nd layer

In [10]:
#ConvolutionLayer2
model.add(Conv2D(32,(3,3)))



**NOTE** :  here we dont need the input_shape parameter as when add the first layer we already connected it to the input layer

In [11]:
model.add(BatchNormalization(axis=-1)) #we follow similar batch normalization

**Note**: we might have confusion regarding axis=-1

When we compute a BatchNormalization along an axis, we preserve the dimensions of the array, and we normalize with respect to the mean and standard deviation over every other axis. So in your 2D example BatchNormalization with axis=1 is subtracting the mean for axis=0, just as you expect.

## STEP 2: MaxPooling

In [12]:
convLayer02=MaxPooling2D(pool_size=(2,2)) 

 layers module having the MaxPool2d class
 
 **pool_size**=2 i.e. we apply a 2x2 max pool filter

Adding the max pooling layer to our model

In [13]:
model.add(convLayer02)

Adding third convolution layer

In [14]:
#convolutionLayer3
model.add(Conv2D(64,(3,3)))
model.add(BatchNormalization(axis=-1))
convLayer03=Activation('relu')
model.add(convLayer03)

Adding the Fourth convolution layer and then adding a max pooling layer

In [15]:
#convolutionLayer4
model.add(Conv2D(64,(3,3)))
model.add(BatchNormalization(axis=-1))
convLayer03=Activation('relu')
convLayer04=MaxPooling2D(pool_size=(2,2))
model.add(convLayer04)





## Step 3:  Flattening


converting the convolutional layer and pooling layer into a one dimensional vector that we will feed to our ANN
from keras library we call the layers module and the flatten class

In [16]:
model.add(Flatten())

## Step 4: Fully Connected Layer

In [17]:
#Fully Connected Layer 5
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))

 **Dense** is used to add a fully connected to our model.
 we call the dense class which takes parameters units.
units is the no of hidden layers you want to have, with high no of neurons we get better accuracy. Here we take 512 neurons.
we perform batch normalization and then add a rectifier activation function

Adding the output layer

In [18]:
#Fully Connected layer 6
model.add(Dropout(0.2))
model.add(Dense(10))
model.add(Activation('softmax'))

Here we add the number of units=10 as we have 10 classes in our data.
pplies Dropout to the input.

The **Dropout layer** randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

**softmax activation function**: Softmax converts a vector of values to a probability distribution.The elements of the output vector are in range (0, 1) and sum to 1. Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.

## Summary

In [19]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 batch_normalization (BatchN  (None, 26, 26, 32)       128       
 ormalization)                                                   
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 32)        9248      
                                                                 
 batch_normalization_1 (Batc  (None, 24, 24, 32)       128       
 hNormalization)                                                 
                                                                 
 max_pooling2d (MaxPooling2D  (None, 12, 12, 32)       0         
 )                                                               
                                                        

We get the total trainable parameters as 596,330
non trainable parameters as 1408




In neural networks in general, and in deep learning algorithms (CNN, DNN, etc.) that are also based on neural networks, trainable parameters are parameters that will be learned by the model during the training procedure such as weights and biases.


## Training the model

## Compiling

**.compile** : configures the model for training.

**Adam**: Optimizer that implements the Adam algorithm.Adam optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments.Adam,the optimizer updates the weight.

**loss**: The purpose of loss functions is to compute the quantity that a model should seek to minimize during training.

**Note**:For binary prediction the loss function is binary_crossentropy and here since we have more than 2 classes i.e. 10 classes our loss function is 

**Categorical crossentropy**: Computes the crossentropy loss between the labels and predictions.We use this crossentropy loss function when there are two or more label classes. We expect labels to be provided in a one_hot representation.

**Accuracy** : the metrics for evaluation. The accuracy class calculates how often predictions equal labels.

In [20]:
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

 We are applying transformations in the images to avoid overfitting i.e. image augmentation.

test_gen is an instance of ImagedataGenerator class.


In [21]:
# data augmentation prevents overfitting by slightly changing the data randomly
# Keras has a great built-in feature to do automatic augmentation
gen = ImageDataGenerator(rotation_range=8, width_shift_range=0.08,
                         shear_range=0.3,
                         height_shift_range=0.08, zoom_range=0.08)
test_gen = ImageDataGenerator()

In Keras via the keras.preprocessing.image.ImageDataGenerator class we can:


*   configure random transformations and normalization operations to be done on your image data during training

*   instantiate generators of augmented image batches (and their labels) via .flow(data, labels) or .flow_from_directory(directory). These generators can then be used with the Keras model methods that accept data generators as inputs, fit_generator, evaluate_generator and predict_generator.

**rotation_range** is a value in degrees (0-180), a range within which to randomly rotate pictures

**width_shift** and **height_shift** are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally.

**shear_range** is for randomly applying shearing transformations
**zoom_range** is for randomly zooming inside pictures




We can feed our augumented data in batches.


In [22]:
train_generator= gen.flow(X_train,Y_train,batch_size=128)
test_generator= test_gen.flow(X_test,Y_test,batch_size=128)

## Fitting our model

.fit method on ann object to train our model
batch training is more efficient in artificial training since training in batches is efficient.


We call fit(), which will train the model by slicing the data into "batches" of size batch_size, and repeatedly iterating over the entire dataset for a given number of epochs.

defualt value of batch training =32, we have taken 128
steps per epoch= total training data divided by no of batches.

epochs =5 i.e. we will train the data in batches and will repeatedly iterate over the entire dataset 5 times.

verbose=1. Verbosity mode 1 shows  progress bar

validation_data: Data on which to evaluate the loss and any model metrics at the end of each epoch. The model will not be trained on this data. We will evaluate on test data.

validation_steps: validation of training data is also done in batches .

In [23]:
model.fit_generator(train_generator,steps_per_epoch=60000//128,epochs=5,verbose=1,
                    validation_data=test_generator,
                    validation_steps=10000//128)

  This is separate from the ipykernel package so we can avoid doing imports until


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f19b3ba9f50>

## Evaluation

Returns the loss value & metrics values for the model in test mode.
Test score and test accuracy are returned

In [24]:
score=model.evaluate(X_test,Y_test)
print('Test score:', score[0])
print('Test accuracy:', score[1])

Test score: 0.04625951871275902
Test accuracy: 0.984000027179718


The test loss is 0.0325 and the accuracy of our model is 98.9 %.
The test score is 3%

## Prediction

In [34]:
Y_pred = model.predict(X_test) 
Y_pred = np.argmax(Y_pred, axis = -1)[:5] #predicting the output array of the first five images
Y_true = np.argmax(Y_test,axis = -1)[:5]  #true output of first five images

#we compare the true value with predicted value
print(Y_pred) 
print(Y_true)

[7 2 1 0 4]
[7 2 1 0 4]


The output of both array is identical and it indicate our model correctly predicts the first five images.