# Convolutional Neural Networks: Algorithm

_Convolutional Neural Networks (CNNs)_ are essentially an extension to *ANNs*, mainly created for image processing and classification. They consist of two main stages: recognizing features and then training an *ANN* to recognize objects from these features.

<img src="convolutionalNeuralNetwork.png" width="700px;" alt="Full diagram of the convolutional neural network." />

## Algorithm

__NOTE:__ The following images contain a highly abstract representation of _CNNs_. Individual pixels are represented as a gradient of the three primary colors: *red, blue,* and *green*. That said, representing pixels as 0 and 1 makes it easier to understand the algorithm.

### 1. Convolution & Rectification

### Convolution

<img src="convolutionalLayer.png" width="600px;" alt="The convolutional layer identifies properties in an image." />

Convolution is the process of finding feature locations within images using a *feature detector*, which is essentially a mapping of pixels to represent a "feature". The _feature detector_ is compared to individual locations within the input image, and a _feature map_ is created to represent the number of pixels that match. The region in the input image that corresponds to the cell having a value of 4 in the _feature map_ is perfectly identical to the *feature detector*, because the _feature detector_ only consists of 4 ones. Each region in the input image is checked just similarily to a 2D array, with each position increment being called a _stride_ instead of a _step_. Both the _stride_ and _feature detector_ dimensions can be changed, the most common values being a _stride_ of 2 and a _feature detector_ size of 3x3. Convolution is performed using many *feature maps*, forming a _convolutional layer_. 

__NOTE:__ A _feature director_ can also be called a _kernel_ or _filter_. A _feature map_ can also be called an _activation map_ or _convolved image_.

### Rectification

<img src="relu.jpg" width="800px;" alt="Increasing the non-linearity of images using the ReLU." />

Non-linearity helps increase the approximation power of a neural network model. Images already contain many non-linear features, such as different pixel transitions and colors. Convolution increases the non-linearity of an image even more by breaking up the image into features only, as seen by the image on the left above. That said, linearity is further increased by by applying the rectifier function on the *feature maps*, which prevents color gradients and further highlights the specific features of objects in images. Applying rectification during convolution produces an image with traits similar to the image on the right above.

- - - - -

### 2. Max Pooling

<img src="maxPooling.png" width="600px;" alt="Abstracting features of an image through max pooling." />

Max pooling involves iterating through regions of the feature map and selecting the max value from each region. Both the region size and _stride_ can be changed, similiarily to convolution. Pooling is used for two reasons. Firstly, it reduces the spatial size of the matrix representation of an image, which will increase computational speed in return. The second reason is more important, which is that it prevents over-fitting from occuring in the _ANN_ used in the later steps. By performing pooling, the image is represented as a set of characteristics that don't need to be in a specific location. This way, images that portray the same object differently may still be assigned the same classification by the _CNN_. The resulting matricies from max pooling are called _pooled feature maps_ and are part of the _pooling layer_.

__NOTE:__ There are many types of pooling functions, such as average pooling (a.k.a subsampling). Max pooling is the most popular pooling function used to date.

- - - - -

### 3. Flattening

<img src="flattening.png" width="600px;" alt="Creating the input layer from the pooled feature map." />

Each _pooled feature map_ can be flattened, which is essentially the process of transforming a matrix into an array. During the flattening stage, all _pooled feature maps_ will be flattened and will be combined together to form a single _input layer_ for the _ANN_.

- - - - -

### 4. The Artifical Neural Network

<img src="fullyConnected.png" width="500px;" alt="Image of a fully connected artifical neural network."/>

Once the input layer has been created by flattening the *pooled feature maps*, it can be classified using the _ANN_. One of the requirements specific to the _CNN_ is that all the hidden layers in the _ANN_ have to be fully-connected. This means that they have to receive input from all neurons in the previous layer and give their output to all neurons in the next layer. The output layer will consist of a neuron for each class, but it is possible to represent 2 classes with a single neuron by using a binary output.

__Learning:__
The _CNN_ will not only adjust the weights of the synapses when training, but will also modify the _feature detectors_. This is because the algorithm could've intially chosen the wrong features to look for when classifying images. 

- - - - - 

### Softmax & Cross Entropy

One of the axioms of probability is that the combined probability of all possible outcomes should add up to 1, but this doesn't occur in the output layer of the _CNN_. This can be fixed by performing logistic regression with the _Softmax_ function, which will provide each output neuron with a value so that the combined value of the output neurons will add up to 1. 

$$\LARGE f_j(z) = \frac{e^{z_j}}{\sum_k e^{z_k}}$$

__Where:__
* $z$: Output Neuron Value
* $j$: Current Output Neuron
* $k$: # of Output Neurons

When dealing with *Neural Networks*, it is better to use a *Cross Entropy* cost function compared to most other cost functions. This is because most cost functions produce very small output when the actual value is close to the expected value, which causes training to just stall out. With *Cross Entropy*, the weight changes don't get drastically smaller as the actual value gets close to the expected value; this promotes further learning. 

$$\LARGE H(p, q) = -\sum_k p_k \; log(q_k) $$

__Where:__
* $p$: Expected Value
* $q$: Actual Value 
* $k$: # of Output Neurons

__NOTE:__ The _Softmax_ & _Cross Entropy_ functions are used for multiclass logistic regression, while the _Sigmoid_ and _Binary Cross Entropy_ functions are their two-class counterparts.

