In [1]:
'''Name: Apeksha Chavan
BE COMPS
UID:2017130013
FCI Exp 2: Experiment on studying different CNN architectures

What is Convolutional Neural Network?
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance 
(learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. 
The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. 
While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics. 
The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. 
Individual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. 
A collection of such fields overlap to cover the entire visual area.

Basic Architecture
There are two main parts to a CNN architecture
A convolution tool that separates and identifies the various features of the image for analysis in a process called as Feature Extraction
A fully connected layer that utilizes the output from the convolution process and predicts the class of the image based on the features extracted in previous stages.

Convolution Layers 
There are three types of layers that make up the CNN which are the convolutional layers, pooling layers, 
and fully-connected (FC) layers. When these layers are stacked, a CNN architecture will be formed.
In addition to these three layers, there are two more important parameters which are the dropout layer and the activation function which are defined below.

1. Convolutional Layer
This layer is the first layer that is used to extract the various features from the input images. 
In this layer, the mathematical operation of convolution is performed between the input image and a filter of a particular size MxM. 
By sliding the filter over the input image, the dot product is taken between the filter and the parts of the input image with respect to the size of the filter (MxM).

The output is termed as the Feature map which gives us information about the image such as the corners and edges. 
Later, this feature map is fed to other layers to learn several other features of the input image.

2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer. 
The primary aim of this layer is to decrease the size of the convolved feature map to reduce the computational costs. 
This is performed by decreasing the connections between layers and independently operates on each feature map. 
Depending upon method used, there are several types of Pooling operations.

In Max Pooling, the largest element is taken from feature map. 
Average Pooling calculates the average of the elements in a predefined sized Image section. 
The total sum of the elements in the predefined section is computed in Sum Pooling. 
The Pooling Layer usually serves as a bridge between the Convolutional Layer and the FC Layer

3. Fully Connected Layer
The Fully Connected (FC) layer consists of the weights and biases along with the neurons and is used to connect the neurons between two different layers. 
These layers are usually placed before the output layer and form the last few layers of a CNN Architecture.

In this, the input image from the previous layers are flattened and fed to the FC layer. 
The flattened vector then undergoes few more FC layers where the mathematical functions operations usually take place. 
In this stage, the classification process begins to take place.

4. Dropout
Usually, when all the features are connected to the FC layer, it can cause overfitting in the training dataset. 
Overfitting occurs when a particular model works so well on the training data causing a negative impact in the model’s performance when used on a new data.

To overcome this problem, a dropout layer is utilised wherein a few neurons are dropped from the neural network during training process resulting in reduced size of the model. 
On passing a dropout of 0.3, 30% of the nodes are dropped out randomly from the neural network.

5. Activation Functions
Finally, one of the most important parameters of the CNN model is the activation function. 
They are used to learn and approximate any kind of continuous and complex relationship between variables of the network. 
In simple words, it decides which information of the model should fire in the forward direction and which ones should not at the end of the network.

It adds non-linearity to the network. There are several commonly used activation functions such as the ReLU, Softmax, tanH and the Sigmoid functions. 
Each of these functions have a specific usage. For a binary classification CNN model, sigmoid and softmax functions are preferred an for a multi-class classification, generally softmax us used.'''


'Name: Apeksha Chavan\nBE COMPS\nUID:2017130013\nFCI Exp 2: Experiment on studying different CNN architectures\n\nWhat is Convolutional Neural Network?\nA Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance \n(learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. \nThe pre-processing required in a ConvNet is much lower as compared to other classification algorithms. \nWhile in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics. \nThe architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human Brain and was inspired by the organization of the Visual Cortex. \nIndividual neurons respond to stimuli only in a restricted region of the visual field known as the Receptive Field. \nA collection of such fields overlap to cover the entire vi

In [2]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import time
import os

In [3]:

(train_x, train_y), (test_x, test_y) = keras.datasets.mnist.load_data()
train_x = train_x / 255.0
test_x = test_x / 255.0

train_x = tf.expand_dims(train_x, 3)
test_x = tf.expand_dims(test_x, 3)

val_x = train_x[:5000]
val_y = train_y[:5000]

In [4]:
lenet_5_model = keras.models.Sequential([
    keras.layers.Conv2D(6, kernel_size=5, strides=1,  activation='tanh', input_shape=train_x[0].shape, padding='same'), #C1 #tanh = Hyperbolic tangent
    keras.layers.AveragePooling2D(), #S2    #activation function = Sigmoid
    keras.layers.Conv2D(16, kernel_size=5, strides=1, activation='tanh', padding='valid'), #C3
    keras.layers.AveragePooling2D(), #S4    #activation function = Sigmoid
    keras.layers.Flatten(), #Flatten
    keras.layers.Dense(120, activation='tanh'), #C5
    keras.layers.Dense(84, activation='tanh'), #F6
    keras.layers.Dense(10, activation='softmax') #Output layer  (In total 7 layers(1 Flatten layer) + 1 Output layer)
])

In [5]:
#Now we can compile and build the model
lenet_5_model.compile(optimizer='adam', loss=keras.losses.sparse_categorical_crossentropy, metrics=['accuracy'])

In [6]:

root_logdir = os.path.join(os.curdir, "logs\\fit\\")

def get_run_logdir():
    run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
    return os.path.join(root_logdir, run_id)

run_logdir = get_run_logdir()
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)

In [7]:
lenet_5_model.fit(train_x, train_y, epochs=5, validation_data=(val_x, val_y), callbacks=[tensorboard_cb])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x22c216ff070>

In [8]:
lenet_5_model.evaluate(test_x, test_y)



[0.0492842011153698, 0.9843999743461609]

In [2]:
'''CONCLUSION:
In this experiment, I have used the LENET-5 architecture, trained and tested the MNIST dataset,
Keras presents a Sequential API for stacking layers of the neural network on top of each other, so for the CNN architecture layers
I have used  the Keras tools required to implement the classification model.
After training, the model achieves a validation accuracy of over 90%.
After training my model, I was able to achieve 98% accuracy on the test dataset, which is quite useful for such a simple network.'''

'CONCLUSION:\nIn this experiment, I have used the LENET-5 architecture, trained and tested the MNIST dataset,\nKeras presents a Sequential API for stacking layers of the neural network on top of each other, so for the CNN architecture layers\nI have used  the Keras tools required to implement the classification model.\nAfter training, the model achieves a validation accuracy of over 90%.\nAfter training my model, I was able to achieve 98% accuracy on the test dataset, which is quite useful for such a simple network.'