# Exercise 08 - CNN Architectures for Image Classification

In this notebook, it is shown how to implement complex CNN architectures (for image classification) in a simple and flexible way. For this, we use the subclassing API to group together a sequence/collection of layers into more complex layers (or modules), so that we are able to reuse them many times. At the same time, we need the functional API to describe the flow of data (tensors) through the layers.

**Learning objectives:**
- Learn how to build more complex neural network architectures
- Get to know the Subclassing and Functional API of TensorFlow
- Practice to implement some modules and architecture by yourself

**The constructed neural network architectures are just constructed in this notebook, but not trained in any way. Check and compare your solutions with the network summaries in the accompanying PDF notebook version.**

Please note that we have implemented the suggested solution carefully and to the best of our knowledge. If your solution looks slightly different, then your solution is not necessarily wrong. Talk to us and we will check what the differences could be. We, too, make mistakes.

**Before you continue, find a GPU on the system that is not heavily used by other users (with nvidia-smi), and change X to the id of this GPU.**

In [1]:
# Change X to the GPU number you want to use,
# otherwise you will get a Python error
# e.g. USE_GPU = 4
USE_GPU = 4

In [2]:
# Import TensorFlow 
import tensorflow as tf

# Print the installed TensorFlow version
print(f'TensorFlow version: {tf.__version__}\n')

# Get all GPU devices on this server
gpu_devices = tf.config.list_physical_devices('GPU')

# Print the name and the type of all GPU devices
print('Available GPU Devices:')
for gpu in gpu_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set only the GPU specified as USE_GPU to be visible
tf.config.set_visible_devices(gpu_devices[USE_GPU], 'GPU')

# Get all visible GPU  devices on this server
visible_devices = tf.config.get_visible_devices('GPU')

# Print the name and the type of all visible GPU devices
print('\nVisible GPU Devices:')
for gpu in visible_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set the visible device(s) to not allocate all available memory at once,
# but rather let the memory grow whenever needed
for gpu in visible_devices:
    tf.config.experimental.set_memory_growth(gpu, True)

2024-01-11 07:41:58.405406: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


TensorFlow version: 2.12.0

Available GPU Devices:
  /physical_device:GPU:0 GPU
  /physical_device:GPU:1 GPU
  /physical_device:GPU:2 GPU
  /physical_device:GPU:3 GPU
  /physical_device:GPU:4 GPU
  /physical_device:GPU:5 GPU
  /physical_device:GPU:6 GPU
  /physical_device:GPU:7 GPU

Visible GPU Devices:
  /physical_device:GPU:4 GPU


## ResNet34

The following example shows how to implement the ResNet CNN architecture with 34 (trainable) layers. The ResNet architecture uses residual blocks that are repeated several times that only differ in the number of convolutional filters these blocks use, and the use of a stride of 2 when the number of filters change in order to shrink the activation volumes. We therefore define a new module (as a TensorFlow/Keras layer) that we can use over and over again, and that are parameterized in this exact way. Besides the residual blocks, the ResNet architecture has a stump (at the beginning) that quickly shrinks the image with a large filter size, and a classifier head (at the end).

The implementation therefore follows these three steps:
1. Define a residual block module that can be configured and re-used in the main block with the Subclassing API
2. Construct a neural network stump model using the Sequential API (although we could also use the Functional API for this)
3. Connect the stump with a sequence of configured residual blocks
4. Add a classifier as the head

Please note that the following implementation is very simplified and only exposes the necessary configurations (filter number, stride) and abstractions for the architectures of this notebook. Typically, one would probably also allow some more flexibility like the configuration of the activation function or the kernel initializer. And often, layers that are repeated several times in a module, like the Conv2D layer, would be specified in a new object, so that the specifications of that particular layer can be changed at only one place. For example, the Conv2D layer is repeated with the same configuration in the residual block three times. If we would like to change something, e.g. the activation function of the Conv2D layer to use leaky ReLU instead of ReLU, then we would need to change it also at three different locations, which could lead to errors, because we must not forget any of these three layers.

**Define the residual block layer**

All neural networks of the ResNet family consist in their center part of several stacked residual blocks, which differ only in their number of filters. Therefore, using the subclassing API of TensorFLow, we define a new Keras layer class called `ResidualBlock` that is a subclass of the superclass `Layer` (of the layers module of Keras). Then, we can use this residual block over and over again. 

For such layer classes, we need to define the constructor (`__init()__`) that is called when the object of that class is constructed, and the method `call()` that is called in the forward pass (of the network that contains this residual block).

In the `__init()__` method, the layers of the main path, and of the skip connection are defined:
- The main path is pretty straight-forward and is just a sequence of a 2D convolutional layer, batch normalization, activation function (typically ReLU), another 2D convolutional layer, and batch normalization.
- The skip connection path is typically empty and the input of this path is equal to its output. Only when the stride of the residual block is greater than one, then the size of the activation volume shrinks in the main path, and the skip connection must also shrink the activation volume accordingly, as otherwise they cannot be added together at the end of this block.
As you might notice, the layers are at this point just constructed and stored (as a sequence) in Python lists. (With the exception of the last ReLU activation function, which is just stored in a variable.) The layers are at this point not connected in any way.

In the `call()` method, the path of the inputs through the layers is defined. For one, the input goes through the layers of the main path, and for another, through the skip connection. (If the skip connection path is empty, the input remains unchanged, otherwise it shrinks the activation volume.) The resulting tensors from the two paths are then added together, and the result goes one more time through an activation function. In this method, we basically call the layers (contained in the list we constructed in the constructor method) in the right order using the input of the neural network that is provided to this method. The output of calling a layer object is then the input of the next layer. So, we loop through the layers in the lists, call the layer with the input (which might be the output of the previous layer), and store again the output of this layer in the variable, which is then used in the next iteration as the input of the next layer. Before the last ReLU activation function is called, the results from the main path and from the skip connection path are (elemenwise) added with the `+` operator. This is how we define the data (tensor) flow through the layers (and therefore the neural network architecture) with the Functional API.

In [3]:
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU

class ResidualBlock(tf.keras.layers.Layer):
    
    def __init__(self, filters, strides=1, **kwargs):
        super().__init__(**kwargs)
                
        # layers of main path
        self.main_layers = [
            Conv2D(filters, kernel_size=3, strides=strides, padding='same', kernel_initializer='he_normal', use_bias=False),            
            BatchNormalization(),
            ReLU(),
            Conv2D(filters, kernel_size=3, strides=1, padding='same', kernel_initializer='he_normal', use_bias=False),            
            BatchNormalization()
        ]
        
        # layers of skip connection path
        self.skip_layers = []
        
        # if the stride is greater than 1, then use a kernel size
        if strides > 1:
            self.skip_layers = [
                Conv2D(filters, kernel_size=1, strides=strides, padding='same', kernel_initializer='he_normal', use_bias=False),            
                BatchNormalization()
            ]
                        
        self.activation = ReLU()
    
    def call(self, inputs):
        
        # main path
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
            
        # skip connection path
        skip_Z = inputs
        for layer in self.skip_layers:
            skip_Z = layer(skip_Z)
            
        # add the two results,
        # and apply activation function
        return self.activation(Z + skip_Z)

**Construct the network stump**

Next, we define the ResNet stump that takes the input, applies batch normalization, the ReLU activation function, and max pooling. For this, we use for simplicity the Sequential model.

In [4]:
from tensorflow.keras.layers import MaxPool2D

model = tf.keras.Sequential([
    Conv2D(64, kernel_size=7, strides=2, padding='same', kernel_initializer='he_normal', use_bias=False, input_shape=[224, 224, 3]),   
    BatchNormalization(),
    ReLU(),
    MaxPool2D(pool_size=3, strides=2, padding='same')
])

2024-01-11 07:42:05.833122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14744 MB memory:  -> device: 4, name: Quadro RTX 5000, pci bus id: 0000:81:00.0, compute capability: 7.5


**Construct and connect the residual blocks**

At this point, we want to construct and connect a number of residual blocks to the network stump. In the ResNet34 network, there are **3 blocks of 64 filters, then 4 blocks of 128 filters, 6 blocks of 256 filters, and 3 blocks of 512 filters**. We therefore define a list where there is one entry for each residual block that gives the number of filters. For this purpose, we can use the `*` operator that is defined for lists to repeat the number of elements in the list according to the integer value. For the filter numbers of the residual block, we define the list as follows.

In [5]:
[64] * 3 + [128] * 4 + [256] * 6 + [512] * 3

[64, 64, 64, 128, 128, 128, 128, 256, 256, 256, 256, 256, 256, 512, 512, 512]

We now just need to iterate through this list, construct a residual block with the number of filters from the current list element, and add the residual block to the model defined above.

There is just one more thing: Whenever the number of filters increase, ResNet decreases the size of the activation volume by using a stride of 2 instead of 1. To accomodate for this, we keep the previous filter size, compare it with the current filter size, and if they are not the same, we use a stride of 2, otherwise a stride of 1. (We use the short if-then-else syntax of Python to have a one-liner of code.)

In [6]:
prev_filters = 64

for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:
    strides = 1 if filters == prev_filters else 2
    model.add(ResidualBlock(filters, strides=strides))
    prev_filters = filters

Alternatively, we could keep track of the residual block number in the for-loop using `enumerate()`, and give the block a meaningful name.

(Make sure you do not call cells that add layers to models several times, as you would add layers over and over again. Not only will your model unintentionally get bigger and bigger, but you may also get errors if the layers don't match each other in terms of their configuration. If you need to change something or want to use the following alternative cell, then construct the sequential model once more.)

In [7]:
model = tf.keras.Sequential([
    Conv2D(64, kernel_size=7, strides=2, padding='same', kernel_initializer='he_normal', use_bias=False, input_shape=[224, 224, 3]),   
    BatchNormalization(),
    ReLU(),
    MaxPool2D(pool_size=3, strides=2, padding='same')
])

prev_filters = 64

for i, filters in enumerate([64] * 3 + [128] * 4 + [256] * 6 + [512] * 3):
    strides = 1 if filters == prev_filters else 2
    model.add(ResidualBlock(filters, strides=strides, name=f'ResBlock_{i+1:02}'))
    prev_filters = filters

**Add classifier head**

What is missing is the classifier head, where global average pooling is applied on the activation volume resulting from the last residual block, then flattens the result, and uses a dense layer to produce the, e.g., ten class scores probabilities (using the softmax function).

In [8]:
from tensorflow.keras.layers import GlobalAvgPool2D, Flatten, Dense

model.add(GlobalAvgPool2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

Check with the `summary()` method that the architecture of the ResNet34 model is really like you expect it to be.

In [9]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_36 (Conv2D)          (None, 112, 112, 64)      9408      
                                                                 
 batch_normalization_36 (Bat  (None, 112, 112, 64)     256       
 chNormalization)                                                
                                                                 
 re_lu_33 (ReLU)             (None, 112, 112, 64)      0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 56, 56, 64)       0         
 2D)                                                             
                                                                 
 ResBlock_01 (ResidualBlock)  (None, 56, 56, 64)       74240     
                                                                 
 ResBlock_02 (ResidualBlock)  (None, 56, 56, 64)      

**Question 1: Does the constructed ResNet34 model really contain 34 layers as the name suggests?** 

Starting with the ResNet50 architecture (ResNet50, ResNet101, and ResNet152), the residual blocks also contain bottleneck (1x1 Conv2D) layers in the main path. The first (1x1) bottleneck layer is used **instead** of the first (3x3 Conv2D) 'regular' layer. And the second bottleneck layer comes after the second Cond2D layer, also in the main task. So, the ResNet34 architecture can be easily transformed into the ResNet50 architecture by just changing the residual block accordingly.

# VGG16

**Task: Implement the VGG16 neural network architecture with 'VGG blocks' in the same way as in the above example.**

The VGG16 network is much simpler than the ResNet architecture, but you should also start by implementing a 'VGG block' class (called `VGGBlock`). A VGG block consists of 2 or 3 convolutional (´Conv2D´) layers, and then a max pooling (´MaxPool2D´) layer. (The VGG19 network also includes blocks of 4 convolutional layers, and this VGG block class should also be able to construct such blocks.)

The constructor `__init__()` should take two parameters: one for the number of convolutional layers (where you specify if there should be 2, 3, or 4 convolutional layers), and the number of filters. Within the constructor, just construct as many `Conv2D` layers as specified in the parameter (a simple for-loop). The kernel size of the convolutional layers is (3,3), use zero padding ('same'), and ReLU ('relu') as the activation function. Since VGG does not have batch normalization layers between the convolutional layers and the activation function, you can just specify the Conv2D layers to use 'relu' as activation. The `MaxPool2D` layers use a pool size (`pool_size`) of (2,2), and strides of (2,2).

In the forward pass, the `call()` method, the input is just passed through all the constructed layers, and the output of the last layer is returned. There are no parallel paths.

In [41]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, ReLU, MaxPooling2D

class VGGBlock(tf.keras.layers.Layer):
    def __init__(self, num_conv_layers, filters, **kwargs):  
        super(VGGBlock, self).__init__(**kwargs)
        
        self.num_conv_layers = num_conv_layers
        self.filters = filters
        
        self.conv_layers = []
        for _ in range(1,num_conv_layers+1):
            self.conv_layers.append(Conv2D(filters, (3, 3), use_bias=True, padding='same', activation='relu'))
        
        self.max_pooling = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))
            
    def call(self, inputs):
        x = inputs
        for conv_layer in self.conv_layers:
            x = conv_layer(x)
        x = self.max_pooling(x)
        return x
  

The VGG16 network does not have a real stump, and basically starts directly with a VGG block. So, it is sufficient to construct a Sequential model with only an `Input` layer that defines the input shape (`shape` parameter). (But that also depends a little on how you defined your VGG block.)

In [42]:
#model = tf.keras.Sequential([
#    Conv2D(64, kernel_size=(3,3), padding='same', use_bias=True, activation="relu", input_shape=[224, 224, 3]) 
#])
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input

input_shape = (224, 224, 3) 
model = Sequential()
model.add(Input(shape=input_shape))

After the stump, there are five VGG blocks:
- Block with 2 convolutional layers using 64 filters
- Block with 2 convolutional layers using 128 filters
- Block with 3 convolutional layers using 256 filters
- Block with 3 convolutional layers using 512 filters
- Block with 3 convolutional layers using 512 filters

(Please note that the VGG16 network is often depicted with only two convolutional layers in the 3rd block. But together with the three dense layers of the classifier head, this would only result in 15 trainable layers. Same goes for the VGG19 network, where the 3rd block is often depicted wrongly.)

Add these VGG blocks to the model.

Note that blocks 4 and 5 cannot be merged into a block of 6 convolutional layers using 512 filters, since there is a max pooling layer after block 4 that we would otherwise miss.

Since there are no real repetitions of layers, you do not need to construct a list that stored the number of layers and filter sizes first, but you can directly add the VGG blocks to the model. (If you would create a list first, you would also need to store tuples that contain these two parameter values instead of just the number of filters as above.) 

You could also just add the VGG blocks directly after the Input layer when you construct the Sequential model above. Same goes for the following classifier.

In [43]:
#model = tf.keras.Sequential([
#   Conv2D(64, kernel_size=(3,3), padding='same', use_bias=True, activation="relu", input_shape=[224, 224, 3]), 
    
#])

model.add(VGGBlock(num_conv_layers=2, filters=64, name='vgg_block_1')),
model.add(VGGBlock(num_conv_layers=2, filters=128, name='vgg_block_2')),
model.add(VGGBlock(num_conv_layers=3, filters=256, name='vgg_block_3')),
model.add(VGGBlock(num_conv_layers=3, filters=512, name='vgg_block_4')),
model.add(VGGBlock(num_conv_layers=3, filters=512, name='vgg_block_5'))


And finally, the VGG16 network has a classifier head that flattens the activation volume, has two dense layers of 4096 units (using ReLU as ativation function), and one dense layer with the number of neurons as classes (here 10) and a softmax activation. Add the layers accordingly.

In [44]:
from tensorflow.keras.layers import Flatten, Dense

model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(10, activation='softmax'))

Use summary to inspect and verify your model.

In [45]:
model.summary()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg_block_1 (VGGBlock)      (None, 112, 112, 64)      38720     
                                                                 
 vgg_block_2 (VGGBlock)      (None, 56, 56, 128)       221440    
                                                                 
 vgg_block_3 (VGGBlock)      (None, 28, 28, 256)       1475328   
                                                                 
 vgg_block_4 (VGGBlock)      (None, 14, 14, 512)       5899776   
                                                                 
 vgg_block_5 (VGGBlock)      (None, 7, 7, 512)         7079424   
                                                                 
 flatten_7 (Flatten)         (None, 25088)             0         
                                                                 
 dense_19 (Dense)            (None, 4096)            

When the parameter `expand_nested` is set to True, the summary also contains the nested layers (when a layer object consists of other layers). This does not alway seem to work, e.g. if there are parallel paths as in the ResNet architecture.

In [46]:
model.summary(expand_nested=True)

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg_block_1 (VGGBlock)      (None, 112, 112, 64)      38720     
                                                                 
 vgg_block_2 (VGGBlock)      (None, 56, 56, 128)       221440    
                                                                 
 vgg_block_3 (VGGBlock)      (None, 28, 28, 256)       1475328   
                                                                 
 vgg_block_4 (VGGBlock)      (None, 14, 14, 512)       5899776   
                                                                 
 vgg_block_5 (VGGBlock)      (None, 7, 7, 512)         7079424   
                                                                 
 flatten_7 (Flatten)         (None, 25088)             0         
                                                                 
 dense_19 (Dense)            (None, 4096)            

**Question 2: What do you need to change to get from the VGG16 to the VGG 19 architecture?**

In [47]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, ReLU, MaxPooling2D, Input,Flatten, Dense
from tensorflow.keras.models import Sequential


class VGGBlock(tf.keras.layers.Layer):
    def __init__(self, num_conv_layers, filters, **kwargs):  
        super(VGGBlock, self).__init__(**kwargs)
        
        self.num_conv_layers = num_conv_layers
        self.filters = filters
        
        self.conv_layers = []
        for _ in range(1,num_conv_layers+1):
            self.conv_layers.append(Conv2D(filters, (3, 3), use_bias=True, padding='same', activation='relu'))
        
        self.max_pooling = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))
            
    def call(self, inputs):
        x = inputs
        for conv_layer in self.conv_layers:
            x = conv_layer(x)
        x = self.max_pooling(x)
        return x

    
input_shape = (224, 224, 3) 
model = Sequential()
model.add(Input(shape=input_shape))
model.add(VGGBlock(num_conv_layers=3, filters=64, name='vgg_block_1')),
model.add(VGGBlock(num_conv_layers=3, filters=128, name='vgg_block_2')),
model.add(VGGBlock(num_conv_layers=4, filters=256, name='vgg_block_3')),
model.add(VGGBlock(num_conv_layers=4, filters=512, name='vgg_block_4')),
model.add(VGGBlock(num_conv_layers=4, filters=512, name='vgg_block_5'))
model.add(Flatten())
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(units=4096,activation="relu"))
model.add(Dense(10, activation='softmax'))

model.summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg_block_1 (VGGBlock)      (None, 112, 112, 64)      75648     
                                                                 
 vgg_block_2 (VGGBlock)      (None, 56, 56, 128)       369024    
                                                                 
 vgg_block_3 (VGGBlock)      (None, 28, 28, 256)       2065408   
                                                                 
 vgg_block_4 (VGGBlock)      (None, 14, 14, 512)       8259584   
                                                                 
 vgg_block_5 (VGGBlock)      (None, 7, 7, 512)         9439232   
                                                                 
 flatten_8 (Flatten)         (None, 25088)             0         
                                                                 
 dense_22 (Dense)            (None, 4096)            

# GoogLeNet

**Task: Implement the Inception module, and the GoogLeNet neural network architecture.**

When implementing the Inception block, there is no real benefit of organizing the layers in lists, since the four parallel paths have either one or two layers, only. Just store them in class variables (that might be named according to the path).

The Inception block does not have much variation, only in the number of filters that are used. However, there are six convolutional layers for which the number of filters are to be specified. These can be given to the Inception block class by a Python list, e.g. by `filters=[64, 96, 128, 12, 32, 32]`. Just be careful in which order you fill this list, and that you use the correct index for the respective convolutional layer. **Take a look at the definition of the main body of the architecture below.**

The four paths are specified as follows:
- Path 1: 1 convolutional layer with kernel size 1x1
- Path 2: 1 convolutional layer with kernel size 1x1, then 1 convolutional layer with kernel size 3x3
- Path 3: 1 convolutional layer with kernel size 1x1, then 1 convolutional layer with kernel size 5x5
- Path 4: 1 max pooling layer with pooling size 3x3 (strides of 1), then 1 convolutional layer with kernel size 1x1

All convolutional layers are with strides 1x1 (default), zero padding ('same'), and ReLU ('relu') activation function. 

The max pooling layer uses also zero padding to keep the size of the activation volume.

In the forward path, the results of the four paths are concatenated by the last dimension (axis). You can use the `concat()` function of TensorFlow, which takes a Python list of tensors (the four variables that store the outputs of the four paths),
and you need to specify the last axis, e.g. by using `-1`.

In [10]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, concatenate

class InceptionBlock(tf.keras.layers.Layer):
    def __init__(self, filters):
        super(InceptionBlock, self).__init__()
        self.path1 = Conv2D(filters[0], (1, 1), activation='relu', padding='same')

        self.path2 = Conv2D(filters[1], (1, 1), activation='relu', padding='same')
        self.path2_2 = Conv2D(filters[2], (3, 3), activation='relu', padding='same')

        self.path3 = Conv2D(filters[3], (1, 1), activation='relu', padding='same')
        self.path3_2 = Conv2D(filters[4], (5, 5), activation='relu', padding='same')

        self.path4 = MaxPooling2D((3, 3), strides=(1, 1), padding='same')
        self.path4_2 = Conv2D(filters[5], (1, 1), activation='relu', padding='same')

    def call(self, x):
        p1 = self.path1(x)
        p2 = self.path2(x)
        p2 = self.path2_2(p2)
        p3 = self.path3(x)
        p3 = self.path3_2(p3)
        p4 = self.path4(x)
        p4 = self.path4_2(p4)

        # Concatenate along the last axis
        return concatenate([p1, p2, p3, p4], axis=-1)



The GoogLeNet architecture is then defined as follows:

**Stump:**
- Input layer as in the previous two networks
- Convolutional layer with 64 filters of size (7x7) with strides (2x2)
- Max pooling layer (3x3) with strides (2x2)
- There comes a local response normalization at this point that you can skip for this notebook
- Convolutional layer with 64 filters of size (1x1) with strides (1x1)
- Convolutional layer with 192 filters of size (3x3) with strides (1x1)
- Again a local response normalization that you can skip
- Max pooling layer (3x3) with strides (2x2)

**Main body:**
- InceptionBlock(filters=[64, 96, 128, 12, 32, 32])
- InceptionBlock(filters=[128, 128, 192, 32, 96, 64])
- Max pooling layer (3x3) with strides (2x2)
- InceptionBlock(filters=[192, 96, 208, 16, 48, 64])
- InceptionBlock(filters=[160, 112, 224, 24, 64, 64])
- InceptionBlock(filters=[128, 128, 256, 24, 64, 64])
- InceptionBlock(filters=[112, 144, 288, 32, 64, 64])
- InceptionBlock(filters=[256, 160, 320, 32, 128, 128])
- Max pooling layer (3x3) with strides (2x2)
- InceptionBlock(filters=[256, 160, 320, 32, 128, 128])
- InceptionBlock(filters=[384, 192, 384, 48, 128, 128])

**The filters of the Inception modules are given as:**
- filters[0] -> only convolutional layer of 1st path
- filters[1] -> 1st convolutional layer (1x1) of 2nd path
- filters[2] -> 2nd convolutional layer (3x3) of 2nd path
- filters[3] -> 1st convolutional layer (1x1) of 3rd path
- filters[4] -> 2nd convolutional layer (5x5) of 3rd path
- filters[5] -> only convolutional layer (1x1) of 4th path

**Classifier head:**
- Global average pooling (as in ResNet)
- A dropout (`Dropout`) layer with a dropout rate of 0.4 
- A fully connected layer that outputs the probabilities for ten classes 

**Ignore the auxiliary heads.**

All layers have zero padding, and use the ReLU activation function.

In [11]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D , GlobalAveragePooling2D ,Dropout,Dense,concatenate
from tensorflow.keras.models import Model

def stump(input_layer):
    x = Conv2D(64, (7, 7), activation='relu', strides=(2, 2), padding='same')(input_layer)
    x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
    x = Conv2D(64, (1, 1), activation='relu', padding='same')(x)
    x = Conv2D(192, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
    return x

def Inception_Block(x, filters):
    inception_block = InceptionBlock(filters)
    x = inception_block(x)
    return x

# Input layer
input_shape = (224, 224, 3)
input_layer = Input(shape=input_shape)

# Stump
x = stump(input_layer)

# Main body with Inception blocks
x = Inception_Block(x, filters=[64, 96, 128, 12, 32, 32])
x = Inception_Block(x, filters=[128, 128, 192, 32, 96, 64])
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = Inception_Block(x, filters=[192, 96, 208, 16, 48, 64])
x = Inception_Block(x, filters=[160, 112, 224, 24, 64, 64])
x = Inception_Block(x, filters=[128, 128, 256, 24, 64, 64])
x = Inception_Block(x, filters=[112, 144, 288, 32, 64, 64])
x = Inception_Block(x, filters=[256, 160, 320, 32, 128, 128])
x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)
x = Inception_Block(x, filters=[256, 160, 320, 32, 128, 128])
x = Inception_Block(x, filters=[384, 192, 384, 48, 128, 128])

# Classifier head
x = GlobalAveragePooling2D()(x)
x = Dropout(0.4)(x)
output_layer = Dense(10, activation='softmax')(x)

# Create the model
model = Model(inputs=input_layer, outputs=output_layer)


In the summary, you can now compare if the general structure of your network fits.

In [12]:
model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_7 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 conv2d_288 (Conv2D)         (None, 112, 112, 64)      9472      
                                                                 
 max_pooling2d_67 (MaxPoolin  (None, 56, 56, 64)       0         
 g2D)                                                            
                                                                 
 conv2d_289 (Conv2D)         (None, 56, 56, 64)        4160      
                                                                 
 conv2d_290 (Conv2D)         (None, 56, 56, 192)       110784    
                                                                 
 max_pooling2d_68 (MaxPoolin  (None, 28, 28, 192)      0         
 g2D)                                                      

**Question 3: How many trainable layers does GoogLeNet have?**

**Question 4: Why do ResNet and GoogLeNet have so few trainable parameters in comparison to VGG?**

# Answers to questions:

**Question 1: Does the constructed ResNet34 model really contain 34 layers as the name suggests?**  
First, we count only trainable layers and not the remaining layers that have no weights (like ReLU, batch nofrmalization, pooling, etc.). Second, we only count the layers along the longest path. If there are parallel paths, as in ResNet, then only the layers in the longer path are counted, and the ones in the shorter path (like skip connection path in ResNet) are not counted. In the given example of ResNet34, there is one trainable (Conv2D) layer in the stump, and one trainable (Dense) layer in the classifier. The main path of each residual block contains two trainable (Conv2D) layers, and one trainable (Conv2D) layer in the shorter skip connection layer. Since we have 16 residual blocks, there are altogether 32 trainable layers. Together this makes 34 trainable layers.

**Question 2: What do you need to change to get from the VGG16 to the VGG 19 architecture?**  
The only difference is that the VGG blocks 3, 4, and 5 have four convolutional layers instead of three.

**Question 3: How many trainable layers does GoogLeNet have?**  
The stump has three convolutional layers, there are nine Inception blocks with two trainable layers each, and one dense layer in the classifier. Altogether that makes 22 trainable layers.

**Question 4: Why do ResNet and GoogLeNet have so few trainable parameters in comparison to VGG?**  
Most of the trainable parameters in VGG are in the first fully connected (dense) layer that has 4,096 units that all take 25,088 values as inputs. Together with the bias, that makes 4,096\*25,089=102,764,544 trainable parameters, which is already around five times as many parameters as ResNet and GoogLeNet have in total. The second fully connected (dense) layer in VGG has another 16,781,312 trainable parameters, which is also quite a lot. ResNet and GoogLeNet also use global average pooling to reduce the activation volumes of the main convolutional body to a small vector of 512 and 1024 values, which taken as input to the subsequence fully connected (dense) layer results in much less trainable parameters.