# VGG
The VGG architecture, winner of the 2014 ImageNet ILSVRC contest for image classification, is considered the father of modern CNNs, while AlexNet is considered the
grandfather. VGG formalized the concept of constructing a CNN into components and groups by using a pattern. Prior to VGG, CNNs were constructed as ConvNets, whose usefulness did not go beyond academic novelties.

 VGGs were the first to have practical applications in production. For several years after its development, researchers continued to compare more modern SOTA architecture developments to the VGG and to use VGGs for the classification backbone of early SOTA object-detection models

The VGG, along with Inception, formalized the concept of having a first convolutional group that did a coarse-level feature extraction, which we now refer to as the
stem component. Subsequent convolutional groups would then do **finer levels of feature extraction and feature learning**, which we now refer to as **representational learning**, and hence the term learner for this second major component

 Researchers eventually discovered a drawback of a VGG stem: it retained the size of the input (224 × 224) in the extracted coarse feature maps, resulting in an unnecessary number of parameters entering the learner. The quantity of parameters both increased the memory footprint as well as reduced performance for training and prediction. Researchers subsequently addressed this problem in later SOTA models by adding pooling in the stem component, reducing the output size of the coarse-level
feature maps. This change decreased memory footprint while increasing performance, without a loss in accuracy.

The VGG stem component, depicted in figure 5.3, was designed to take as an input a 224 × 224 × 3 image and to output 64 feature maps, each 224 × 224 in size. In other
words, the VGG stem group did not do any size reduction of the feature maps.

<img src="img.png">/


In [1]:
from keras.layers import Conv2D
def stem(inputs):
    """
    Construct a Stem Convolution Group
    inputs: input tensor
    """
    outputs = Conv2D(64, kernel_size=(3,3), strides=(1, 1), padding="same",  activation="relu")(inputs)
    return outputs