# Convolutional Neural Network

__1. Image Input Data :-__

Let’s assume we have a dataset of gray scale images. Each image has the same size of 32 pixels wide and 32 pixels high, and pixel values are between 0 and 255, e.g. a matrix of 32 X 32 X 1 or 1,024 pixel values. Image input data is expressed as a 3-dimensional matrix of width X height X channels. If we were using color images in our example, we would have 3 channels for the red, green and blue pixel values, e.g. 32 X 32 X 3.

__2. Convolutional Layer :-__

We define a convolutional layer with 10 filters and a receptive field 5 pixels wide and 5 pixels high and a stride length of 1. Because each filter can only get input from (i.e. see) 5 X 5 (25) pixels at a time, we can calculate that each will require 25 + 1 input weights (plus 1 for the bias input). Dragging the 5 X 5 receptive field across the input image data with a stride width of 1 will result in a feature map of 28 X 28 output values or 784 distinct activations per image.
We have 10 filters, so that is 10 different 28 X 28 feature maps or 7,840 outputs that will be created for one image. Finally, we know we have 26 inputs per filter, 10 filters and 28 X 28 output values to calculate per filter, therefore we have a total of 26 X 10 X 28 X 28 or 203,840 connections in our convolutional layer, we want to phrase it using traditional neural network nomenclature. Convolutional layers also make use of a nonlinear transfer function as part of activation and the rectifier activation function is the popular default to use.

__3. Pool Layer :-__

We define a pooling layer with a receptive field with a width of 2 inputs and a height of 2 inputs. We also use a stride of 2 to ensure that there is no overlap. This results in feature maps that are one half the size of the input feature maps. From 10 different 28 X 28 feature maps as input to 10 different 14 X 14 feature maps as output. We will use a max() operation for each receptive field so that the activation is the maximum input value.

__4. Fully Connected Layer :-__
Finally, we can flatten out the square feature maps into a traditional flat fully connected layer. We can define the fully connected layer with 200 hidden neurons, each with 10 ⇥ 14 ⇥ 14 input connections, or 1,960 + 1 weights per neuron. That is a total of 392,200 connections and weights to learn in this layer. We can use a sigmoid or softmax transfer function to output probabilities of class values directly.


## Best Practices in CNN 

1. __Receptive Field Size__: The patch should be as small as possible, but large enough to see features in the input data. It is common to use 3 X 3 on small images and 5 X 5 or 7 X 7 and more on larger image sizes.
2. __Stride Width__: Use the default stride of 1. It is easy to understand and you don’t need padding to handle the receptive field falling off the edge of your images. This could be increased to 2 or larger for larger images.
3. __Number of Filters__: Filters are the feature detectors. Generally fewer filters are used at the input layer and increasingly more filters used at deeper layers.
4. __Padding__: Set to zero and called zero padding when reading non-input data. This is useful when you cannot or do not want to standardize input image sizes or when you want to use receptive field and stride sizes that do not neatly divide up the input image size.
5. __Pooling__: Pooling is a destructive or generalization process to reduce overfitting. Receptive field size is almost always set to 2 X 2 with a stride of 2 to discard 75% of the activations from the output of the previous layer.
6. __Pattern Architecture__: It is common to pattern the layers in your network architecture. This might be one, two or some number of convolutional layers followed by a pooling layer. This structure can then be repeated one or more times. Finally, fully connected layers are often only used at the output end and may be stacked one, two or more deep.
7. __Dropout__: CNNs have a habit of overfitting, even with pooling layers. Dropout should be used such as between fully connected layers and perhaps after pooling layers.

## Handwritten Digit Recognition