# Convolutional Neural Network

#### Introduction

A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A collection of such fields overlap to cover the entire visual area.

![CNN](cnn.jpeg)

#### Why ConvNets over Feed-Forward Neural Nets?

| ![Feed Forward](feedf.png) | 
|:--:| 
| *Flattening of a 3x3 image matrix into a 9x1 vector* |


An image is nothing but a matrix of pixel values, right? So why not just flatten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it to a Multi-Level Perceptron for classification purposes? Uh.. not really.
In cases of extremely basic binary images, the method might show an average precision score while performing prediction of classes but would have little to no accuracy when it comes to complex images having pixel dependencies throughout.
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image through the application of relevant filters. The architecture performs a better fitting to the image dataset due to the reduction in the number of parameters involved and reusability of weights. In other words, the network can be trained to understand the sophistication of the image better.


#### Input Image


| ![Input Image](inputimg.png) | 
|:--:| 
| *4x4x3 RGB Image* |

In the figure, we have an RGB image which has been separated by its three color planes — Red, Green, and Blue. There are a number of such color spaces in which images exist — Grayscale, RGB, HSV, CMYK, etc.
You can imagine how computationally intensive things would get once the images reach dimensions, say 8K (7680×4320). The role of the ConvNet is to reduce the images into a form which is easier to process, without losing features which are critical for getting a good prediction. This is important when we are to design an architecture which is not only good at learning features but also is scalable to massive datasets.

#### Convolution Layer — The Kernel

| ![Covolution Layer](cl.gif) | 
|:--:| 
| *Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature* |

Image Dimensions = 5 (Height) x 5 (Breadth) x 1 (Number of channels, eg. RGB)
In the above demonstration, the green section resembles our 5x5x1 input image, I. The element involved in carrying out the convolution operation in the first part of a Convolutional Layer is called the Kernel/Filter, K, represented in the color yellow. We have selected K as a 3x3x1 matrix.

The Kernel shifts 9 times because of Stride Length = 1 (Non-Strided), every time performing a matrix multiplication operation between K and the portion P of the image over which the kernel is hovering.


| ![Movement of Kernel](mkl.png) | 
|:--:| 
| *Movement of Kernel* |


The filter moves to the right with a certain Stride Value till it parses the complete width. Moving on, it hops down to the beginning (left) of the image with the same Stride Value and repeats the process until the entire image is traversed.


| ![Covolution Operation](co.gif) | 
|:--:| 
| *Convolution operation on a MxNx3 image matrix with a 3x3x3 Kernel* |


In the case of images with multiple channels (e.g. RGB), the Kernel has the same depth as that of the input image. Matrix Multiplication is performed between Kn and In stack ([K1, I1]; [K2, I2]; [K3, I3]) and all the results are summed with the bias to give us a squashed one-depth channel Convoluted Feature Output.


The objective of the Convolution Operation is to extract the high-level features such as edges, from the input image. ConvNets need not be limited to only one Convolutional Layer. Conventionally, the first ConvLayer is responsible for capturing the Low-Level features such as edges, color, gradient orientation, etc. With added layers, the architecture adapts to the High-Level features as well, giving us a network which has the wholesome understanding of images in the dataset, similar to how we would.


| ![Covolution Operation](co2.gif) | 
|:--:| 
| *Convolution Operation with Stride Length = 2* |

There are two types of results to the operation — one in which the convolved feature is reduced in dimensionality as compared to the input, and the other in which the dimensionality is either increased or remains the same. This is done by applying Valid Padding in case of the former, or Same Padding in the case of the latter.

When we augment the 5x5x1 image into a 6x6x1 image and then apply the 3x3x1 kernel over it, we find that the convolved matrix turns out to be of dimensions 5x5x1. Hence the name — Same Padding.
On the other hand, if we perform the same operation without padding, we are presented with a matrix which has dimensions of the Kernel (3x3x1) itself — Valid Padding.
The following repository houses many such GIFs which would help you get a better understanding of how Padding and Stride Length work together to achieve results relevant to our needs.


| ![Covolution Operation](co3.gif) | 
|:--:| 
| *SAME padding: 5x5x1 image is padded with 0s to create a 6x6x1 image* |


#### Pooling Layer

Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is to decrease the computational power required to process the data through dimensionality reduction. Furthermore, it is useful for extracting dominant features which are rotational and positional invariant, thus maintaining the process of effectively training of the model.


| ![Pooling](p1.gif) | 
|:--:| 
| *3x3 pooling over 5x5 convolved feature* |


There are two types of Pooling: Max Pooling and Average Pooling. Max Pooling returns the maximum value from the portion of the image covered by the Kernel. On the other hand, Average Pooling returns the average of all the values from the portion of the image covered by the Kernel.

Max Pooling also performs as a Noise Suppressant. It discards the noisy activations altogether and also performs de-noising along with dimensionality reduction. On the other hand, Average Pooling simply performs dimensionality reduction as a noise suppressing mechanism. Hence, we can say that Max Pooling performs a lot better than Average Pooling.

The Convolutional Layer and the Pooling Layer, together form the i-th layer of a Convolutional Neural Network. Depending on the complexities in the images, the number of such layers may be increased for capturing low-levels details even further, but at the cost of more computational power.
After going through the above process, we have successfully enabled the model to understand the features. Moving on, we are going to flatten the final output and feed it to a regular Neural Network for classification purposes.


| ![Max Pooling](mp.png) | 
|:--:| 
| *Max Pooling* |


#### Classification — Fully Connected Layer (FC Layer)


| ![Fully COnnected Layer](fl.jpeg) | 
|:--:| 
| *Fully Connected Layer* |


Adding a Fully-Connected layer is a (usually) cheap way of learning non-linear combinations of the high-level features as represented by the output of the convolutional layer. The Fully-Connected layer is learning a possibly non-linear function in that space.

Now that we have converted our input image into a suitable form for our Multi-Level Perceptron, we shall flatten the image into a column vector. The flattened output is fed to a feed-forward neural network and backpropagation applied to every iteration of training. Over a series of epochs, the model is able to distinguish between dominating and certain low-level features in images and classify them using the Softmax Classification technique.
There are various architectures of CNNs available which have been key in building algorithms which power and shall power AI as a whole in the foreseeable future. Some of them have been listed below:
    
    1. LeNet
    2. AlexNet
    3. VGGNet
    4. GoogLeNet
    5. ResNet
    6. ZFNet


### Importing the libraries

In [14]:
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

In [15]:
tf.__version__

'2.3.0'

## Part 1 - Data Preprocessing

### Preprocessing the Training set

Now we will apply some Geometric Transformations on Training Set images to make them a bit different from each other to increase the diversity of the Training set and avoid overfitting.

The process of applying series of Geometric Transformation on image Dataset is called Image Augmentation and output of this process are augmented image.

First of all to augment our image using the ImageDataGenerator class of image module of preprocessing library of keras package
we create an object or instance of ImageDataGenerator class wich certain arguments which apply transformartions.

After then we apply this ImageDataGenerator object train_datagen to our training set images to get augmented training set.

The know more about each pararmeters these image check [kearas API](https://keras.io/api/preprocessing/image/)

In [16]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)
training_set = train_datagen.flow_from_directory('dataset/training_set',
                                                 target_size = (64, 64),
                                                 batch_size = 32,
                                                 class_mode = 'binary')

Found 7999 images belonging to 2 classes.


### Preprocessing the Test set

On Test set we only apply pixel rescalling transformation and not the Geometric Transformation as this dataset contain images whcih simulate the real world scenario and can't be tempered.

And as similar as above we create an ImageDataGenearator object to apply rescaling transformation to our test set and then connect it our test set to perform the transformation.

In [9]:
test_datagen = ImageDataGenerator(rescale = 1./255)
test_set = test_datagen.flow_from_directory('dataset/test_set',
                                            target_size = (64, 64),
                                            batch_size = 32,
                                            class_mode = 'binary')

Found 2000 images belonging to 2 classes.


## Part 2 - Building the CNN

*Refer Keras API to know about each* [arguments](https://keras.io/api/)

### Initialising the CNN

In [18]:
cnn = tf.keras.models.Sequential()

### Step 1 - Convolution

In [19]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))

### Step 2 - Pooling

In [20]:
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Adding a second convolutional layer

In [21]:
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu'))
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))

### Step 3 - Flattening

In [22]:
cnn.add(tf.keras.layers.Flatten())

### Step 4 - Full Connection

In [23]:
cnn.add(tf.keras.layers.Dense(units=128, activation='relu'))

### Step 5 - Output Layer

In [24]:
cnn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

## Part 3 - Training the CNN

### Compiling the CNN

To compile our ANN we are going to use compile method of tensorflow which take as argument:

    optimizer: a method which optimizes the weight on edge between neurons of two layers with aim of reducing the loss

    loss: loss is difference between predicted and actual value

    metrics: is the array which takes different parameters which are used to measure the efficicency of ANN

In this case we are using 'adam' optimizer fuction which is an Stochastic Gradient Descent and 'binary_crossentropy' as our loss funciton and we are just measuring the accuracy of the ANN .

In this case we are basically classifying user wheter he exited or not thus binary classification and binary_crossentropy loss fucnction.

If we have multiple classes to classify we will use 'categorical_crossentropy'

In [26]:
cnn.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Training the CNN on the Training set and evaluating it on the Test set

To train our CNN we are using fit method of sklearn which takes as agument feature train matrix and target train vector as well as the size of batch we are training and iterations or epochs or number of iter to train.

In ANN we dont train the model on whole dataset at once instead we supply the model with batch of training data and its values is experimental but in most of the cases 32 suits.

Epochs is the number of iterations to do training.

In [27]:
cnn.fit(x = training_set, validation_data = test_set, epochs = 25)

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<tensorflow.python.keras.callbacks.History at 0x251782695e0>

## Part 4 - Making a single prediction

First we load the image using the load_img method which returns image in PIL format.
After that we convert this PIL image into a numpy array using the img_to_array method.

As we are training the CNN using the batch of datasets which add an extra dimension to array and to add this dimension to our test image array we are using the expand_dims method which takes the image and index of dimension as argument.

After that we pass our array image to predict method and then get the indices of our training set.

To get the result in preetified human undestandable manner we have to acess the output from the result output array as images  were in batch thus the ouptuts are also in batches and batches have index 0 so we first acess a batch and after that the prediction within that batch is also at first index which is agained acess by the index 0.

And after that we assign name of that category according to binary prediction and print the result.

In [28]:
import numpy as np
from keras.preprocessing import image
test_image = image.load_img('dataset/single_prediction/cat_or_dog_1.jpg', target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)
result = cnn.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
  prediction = 'dog'
else:
  prediction = 'cat'

In [29]:
print(prediction)

dog
