<a href="https://colab.research.google.com/github/ccarpenterg/LearningTensorFlow2.0/blob/master/05_pretrained_convnets_and_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pretrained Convolutional Networks

Before trying to tackle the problem of overfitting, we will explore the concept of a pre-trained convnet. So this is where the world of AI gets really exciting. We not only can use a great ML framework like TensorFlow, developed by Google, one the most advanced companies in terms of AI, but we also can download a pretrained convolutional neural network, that has been trained by a company like Google or by a research institution like Stanford.


That means that years and years of research are **available** to everybody, provided they have the technical skillls to use these pretrained convolutional neural networks.

Let's start by installing tensorflow 2.0 and checking we have the right version:



In [0]:
!pip install tensorflow==2.0.0-beta1

In [2]:
import tensorflow

print(tensorflow.__version__)

2.0.0-beta1


## VGG16

We'll start by exploring the VGG16 network. It was developed by the Visual Geometry Group at Oxford University, hence the name VGG. It has 16 layers, including convolutional and fullly connected layers, and a little more than 138 million parameters (including weights and biases).

We can download the convnet structure and parameters through the Keras module by instantiating the class VGG16:



In [7]:
from tensorflow.keras.applications import VGG16

convnet = VGG16()

convnet.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

So this is pretty much like downloading a brain, or a section a of a brain to be more precise. Now keep in mind that we have to re-train parts of a pretrained network so that we can apply the knowledge of this network to our particular problem.

Most of the time the pretrained convnet has been trained on large datatasets such as Imagenet and with very different classes, so in order to apply it to our problem we will have to remove the dense classifier, which are the fully connected layers that sit on top of the convolutional and polling layers.

### Feature Extraction and Fine-tuning

So why do we want to use a pre-trained convnet if its cassifier is not able to classify our images? The answer is **Feature extraction**. A convnet's architecture consists of two parts: the convolutional base (including convolutional and pooling layers), and the classifier (fully connected layers).

The convolutional base automatically extract the features that then are fed to the fully conntected layers for classification. Since the features extracted by the convolutional base are universal, in the sense that these are visual features that are part of every object (lines, edges, tones, etc), we can re-use this convolutional base to extract features from our dataset.

If we use the analogy of the human brain, we can say that the visual cortex is equivalent to our convolutional layers. They extract features like horizontal lines, vertical lines, curves, edges, etc.




In [8]:
from tensorflow.keras.applications import VGG16

conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(224, 224, 3))

conv_base.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

### Convolution and Pooling Operations

Now we take a look at the convolutional and pooling operations, and how they were setup for the VGG16 network:

In [5]:
for layer in conv_base.layers:
    if 'conv' in layer.name:
        print('{} {} kernels of {}, stride of {} and {} padding'.format(
                                                                    layer.name,
                                                                    layer.filters,
                                                                    layer.kernel_size,
                                                                    layer.strides,
                                                                    layer.padding))
        
    if 'pool' in layer.name:
        print('{} of size {}, stride of {} and {} padding'.format(
                                                                layer.name,
                                                                layer.pool_size,
                                                                layer.strides,
                                                                layer.padding))

block1_conv1 64 kernels of (3, 3), stride of (1, 1) and same padding
block1_conv2 64 kernels of (3, 3), stride of (1, 1) and same padding
block1_pool of size (2, 2), stride of (2, 2) and valid padding
block2_conv1 128 kernels of (3, 3), stride of (1, 1) and same padding
block2_conv2 128 kernels of (3, 3), stride of (1, 1) and same padding
block2_pool of size (2, 2), stride of (2, 2) and valid padding
block3_conv1 256 kernels of (3, 3), stride of (1, 1) and same padding
block3_conv2 256 kernels of (3, 3), stride of (1, 1) and same padding
block3_conv3 256 kernels of (3, 3), stride of (1, 1) and same padding
block3_pool of size (2, 2), stride of (2, 2) and valid padding
block4_conv1 512 kernels of (3, 3), stride of (1, 1) and same padding
block4_conv2 512 kernels of (3, 3), stride of (1, 1) and same padding
block4_conv3 512 kernels of (3, 3), stride of (1, 1) and same padding
block4_pool of size (2, 2), stride of (2, 2) and valid padding
block5_conv1 512 kernels of (3, 3), stride of (1, 

**Convolution**

The base convolution uses a 3x3 filter (kernel) with a stride of 1 and same padding. As you can see, depending on the depth of the layer it applies 64, 128, 256 or 512 filters. The same padding means that it uses padding in such a way that input and output height and width are the same.

**Max Pooling**

VGG16 uses max pooling of 2x2 with a stride of 2 for the pooling layers.

### Freezing Layers and Preparing for Fine-Tuning

In order to fine-tune our convolutional neural network we need to freeze the convolutional base. Since we're are using tensorflow's Keras API this is really straightforward; we just set the conv_base's parameter trainable to False:

In [6]:
conv_base.trainable = False
conv_base.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0     

Now we see that there are zero trainable parameters, which means that VGG16's convolutional base has been frozen and we're ready to add a new classifier to our convolutional neural network.

### Other Pretrained Convnets



*   Xception
*   ResNet
*   Inception
*   MobileNet
*   DenseNet
*   NASNet

