# VGG Neural Network Architecture
*by Marvin Bertin*
<img src="../../images/keras-tensorflow-logo.jpg" width="400">

# VGG Origins

VGG is a convolutional neural network model invented, back in 2014, by the **Visual Geometry Group**(VGG) at the University of Oxford. Convolutional networks have been the state of the art in visual recognition . The team's main contribution are significant improvements on the prior state of the art convolutional network, achieving, for it's time, a substantially deeper network. Since then CNNs are now much deeper (100s of layers), but they are all still based on the block architecure developed by the VGG team.

# VGG Paper

The corresponding published paper [*Very Deep Convolutional Networks for Large-Scale Image Recognition*](https://arxiv.org/pdf/1409.1556.pdf) is a rigorous evaluation of the network's achitecure as depth is increased.

<img src="../../images/VGG-paper.png" width="800">

# VGG Result

In Summary, the VGG team secured the 1st and the 2nd places in the ImageNet Challenge for localisation and classification tasks respectively. ImageNet is a dataset of over 14 million images belonging to 1000 classes.

The Results are summaries below:

<img src="../../images/VGG-result.png" width="800">


# VGG Architecture

The VGG team created an extremely homogeneous architecture that only performs 3x3 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by 2x2 max pooling. The convolutional layers are then followed by two fully-connected layers and end with softmax classifier. 

The achitecture comes in two size “VGG16” and “VGG19”, which stands for the number of parameterized layers in the network (best performance by configure D):

<img src="../../images/VGGNet.png" width="500">

## VGG16 Architecture

<img src="../../images/vgg16.png" width="500">


**Main Points of the VGG architecture:**

- The use of only 3x3 sized filters, which is small compared to previous models that used 11x11 and 7x7 filter size. However, it turns out that the combination of two 3x3 conv layers has an effective receptive field of 5x5. This simulates a larger filter while keeping the benefits of smaller filter sizes. One of the benefits is a decrease in the number of parameters. Also, with two conv layers, we’re able to use two ReLU nonlinearity layers instead of one.

- 3 conv layers back to back have an effective receptive field of 7x7.

- As the spatial size of the input volumes at each layer decrease (result of the conv and pool layers), the depth of the volumes increase due to the increased number of filters as you go down the network.

- The number of filters doubles after each maxpool layer. This reinforces the idea of shrinking spatial dimensions, but growing depth.

- Works well on both image classification and localization tasks. Localization is treated as a regression task.

- Uses ReLU  activation layers after each conv layer and trained with batch gradient descent.

** Down side of VGG architecture:**

 - it can be slow to train on large dataset because the number of model parameters is quite large, due to its depth and its large fully-connected layers. This makes deploying VGG network difficult.

- Smaller network architectures have been since proposed with comparable performance, such as SqueezeNet.

## Next Lesson
### VGG with TensorFlow-Keras
-  You will implement an improved version of the VGG network in TensorFlow-Keras

<img src="../../images/divider.png" width="100">