# Common Architectures in Convolutional Neural Networks

### Amin Zabardast

#### Quick review: Convolutional Neural Network

Simply described, a Convolutional Neural Network (CNN or ConvNet) is a type of Feed-Forward Artificial Neural Network in which - some of - hidden layers are structured in such a way that their output on a given input is a convoluted version of input data using a kernel.

These kind of Neural Networks are powerful in Image Analysis applications because any pixel in an image will be correlated to other pixels in its vicinity and this correlation would be inversely proportional to the distance of pixels. Convolution operation will gather information from a pixel's vicinity to assign some properties to the pixel in form of numerical value. These Numerical values can form low-level features such as edges and corners in first layers and high-level features such as faces and cars in deeper layers.

#### Architectures are unavoidable

Creating a ConvNet, even the simplest one, requires some designing effort. The number of hidden layers, convolution kernel size, max pooling step size are just a few of the properties a ConvNet will have and from this step onward, everything will get more complicated.

The design of a ConvNet will have a direct effect on its capabilities. For example, shallow Neural Networks will not have enough non-linearity power to classify a big data space, but on the other hand, very deep neural network are hard to train because of vanishing gradient problem. This will result in an optimization problem. In other words, the architecture of the network will have a big effect on its capabilities. 

#### Case Studies

In this document, we will be analyzing some of the ConvNet Architectures which had the highest performance in their relative time. Many of these architectures have been modeled in common platforms or they have finely tuned programs written in lower-level languages such as C++ to boost their performances.

* LeNet (1998)
* AlexNet (2012)
* VGG (2014)
* GoogleNet (2014)
* ResNet (2015)

Studying these architectures will result in a better understanding of main challenges in Convolutional Neural Networks and provides insights to create a better performing architecture.

## LeNet

The concept of Convolutional layer followed by a Pooling layer is at the heart of LeNet Models. This architecture will be followed by some fully connected (FC) layers for classification.

The convolutional layers usually use a $5\times5$ filer but there are examples of different sizes and pooling layers usually use a $2\times2$ max pooling or non ad all, however there are examples of $4\times4$ max pooling layers when in comes to large images.

<img align="center" src="imgs/img1.png"/>
<p>Image 1: Architecture of LeNet 5 (Image credits: LeCun, 1998)</p>

LeNet is the simplest form of a convolutional neural network with a quite shallow architecture. However, the authors of the paper are quite successful to use this architecture in character recognition and mention that this learning architecture is applicable with a wide range of pattern recognition problems.

## AlexNet

The article titled "ImageNet Classification with Deep Convolutional
Neural Networks", Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton created a Deep Convolutional Neural Network which won the 2012 "ImageNet Large Scale Visual Recognition Challenge" (ILSVRC) with a large score difference. It is safe to say that ConvNets become the dominant high scores in the competition ranking after 2012.

In this paper, authors created a relatively simple ConvNet layout. This network consisted of 5 Convolutional Layers, 3 max pooling layers, and a fully connected network in the end.

<img align="center" src="imgs/img2.png"/>
<p>Image 2: Architecture of AlexNet (Image credits: Krizhevsky, 2012)</p>

The main contributions of AlexNet can be summed up as follow:

- First use of rectified linear unit (ReLU): Before this paper, the most non-linearity function used was either tangent hyperbolic or sigmoid function. In this paper, authors showed there is a noticeable difference in learning speed when using ReLU in comparison with other non-linearity functions.

- Data augmentation: The authors utilized data augmentation technique to combat overfitting. They utilized a python code which generated transformed images using CPU as a computational resource while GPU is training a batch, which, to use their words, "... data augmentation schemes are, in effect, computationally free."

- Drop out layers: Implemented dropout layers in order to combat the problem of overfitting to the training data.

- Utilizing momentum in stochastic gradient descent: They use stochastic gradient descent with a momentum of 0.9 to have a fast learning process.

- Training using multiple GPUs: AlexNet is designed