# <font style="color:blue">Assignment: LinkNet Architecture with VGG16</font>

We have already implemented the LinkNet with backbone `ResNet18` in the course. In this assignment, you have to implement the LinkNet Architecture with backbone `VGG16`.

## <font color='blue'>Marking Scheme</font>

#### Maximum Points: 30

<div>
    <table>
        <tr><td><h3>Sr. no.</h3></td> <td><h3>Problem</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>1</h3></td> <td><h3>3. LinkNet Implementation</h3></td> <td><h3>30</h3></td> </tr>
    </table>
</div>

# <font style="color:green">1. LinkNet</font>

Let's briefly overview the LinkNet architecture. [LinkNet](https://arxiv.org/pdf/1707.03718.pdf) was
introduced in 2017 by A.Chaurasia and E.Culurciello as a novel lightweight deep neural network for semantic
segmentation.

---

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w11-LinkNet_architecture.png'>

---

In the picture above `/2` means downsampling of the feature map by a factor of 2 which is achieved by performing strided convolution, `∗2` denotes upsampling by `2`.


An encoder is the left half of the network, whereas the the right side of it is a decoder.


# <font style="color:green">2. Implementation Guidelines </font>


---

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w11-VGG16-architecture-16.png'>
<center>Image credits: www.researchgate.net</center>


---

**1. The following block (before encoder) will change as follows for VGG16:**

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w11-assignment-1.png'>

- `conv[(7x7), (3, 64), /2]` will be replaced by the first convolution layer of `VGG16` (`conv[(3x3), (3, 64)]`).


- `max-pool[(3x3), /2]` will be replaced by the first max-pool layer of `VGG16` (`max-pool[(2x2), /2]`).


**2. Sub-sequent convolution layers till max-pool (including max-pool) will be used as one encoder blocks. For example, fourth (`convolution + ReLU`), fifth (`convolution + ReLU`), and sixth (`max pooling`) layer combined will be used as `Encoder Block 1`.**


**3. The decoder code is already given, you don't have to change anything for the decoder.**

**4. The number of output channels of `decoder block 1` must be `64`. Although logically, it can be any number.**


**5. The block after `decoder block 1`, let's call it `classifier`. You have to write code for the block such that it takes care of the number of classes and input image width and height (output size of the classifier = `[batch_size, num_classes, height, width]`)**

<img src='https://www.learnopencv.com/wp-content/uploads/2020/04/c3-w11-assignment2.png'>

- Replace the above block with your classifier. 

# <font style="color:green">3. LinkNet Implementation [30 Points]</font>


**The LinkNet implementation points divided into the following two parts:**

<p></p>

<div>
    <table>
        <tr><td><h3>Sr. no.</h3></td> <td><h3>Problem</h3></td> <td><h3>Points</h3></td> </tr>
        <tr><td><h3>1</h3></td> <td><h3>Encoder-Decoder Implementation</h3></td> <td><h3>15</h3></td> </tr>
        <tr><td><h3>2</h3></td> <td><h3>Classifier & Forward Implementation</h3></td> <td><h3>15</h3></td> </tr>
    </table>
</div>

In [1]:
import torch
# torch neural network (NN) module for building and training nets
import torch.nn as nn
# module with various model definitions
import torchvision.models as models

import random

## <font style="color:green">3.1. Decoder</font>

The below presented block is a decoder, which takes a feature map with defined channels number. The `channels_in` the result map should be equal to `channels_out`.

We have used `ConvTranspose2d` for upsampling, find details [here](https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d).

In [2]:
# create decoder block inherited from nn.Module
class DecoderBlock(nn.Module):
    def __init__(self, channels_in, channels_out):
        super().__init__()

        # 1x1 projection module to reduce channels
        self.proj = nn.Sequential(
            # convolution
            nn.Conv2d(channels_in, channels_in // 2, kernel_size=1, bias=False),
            # batch normalization
            nn.BatchNorm2d(channels_in // 2),
            # relu activation
            nn.ReLU()
        )

        # fully convolutional module
        self.deconv = nn.Sequential(
            # deconvolution
            nn.ConvTranspose2d(
                channels_in // 2,
                channels_in // 2,
                kernel_size=2,
                stride=2,
                padding=0,
                output_padding=0,
                groups=channels_in // 2,
                bias=False
            ),
            # batch normalization
            nn.BatchNorm2d(channels_in // 2),
            # relu activation
            nn.ReLU()
        )

        # 1x1 unprojection module to increase channels
        self.unproj = nn.Sequential(
            # convolution
            nn.Conv2d(channels_in // 2, channels_out, kernel_size=1, bias=False),
            # batch normalization
            nn.BatchNorm2d(channels_out),
            # relu activation
            nn.ReLU()
        )

    # stack layers and perform a forward pass
    def forward(self, x):

        proj = self.proj(x)
        deconv = self.deconv(proj)
        unproj = self.unproj(deconv)

        return unproj

## <font style="color:green">3.2. LinkNet</font>

**Write your code where it is specified. Do not modify / delete other codes.**

In [17]:
# create LinkNet model with VGG16 encoder
class LinkNet(nn.Module):
    def __init__(self, num_classes, encoder="vgg16"):
        super().__init__()
        assert hasattr(models, encoder), "Undefined encoder type"
        # prepare feature extractor from `torchvision` vgg16 model
        
        vgg16 = getattr(models, encoder)(pretrained=False)
        
        ###########################################################################################
        # write code for self.init and self.maxpool
        # for self.init: the first convolution layer of VGG16 (conv[(3x3), (3, 64)]).
        # for self.maxpool: the first max-pool layer of VGG16 (max-pool[(2x2), /2]).
        ###########################################################################################
        
        self.init = None
        self.maxpool = None
        
        ###
        ### YOUR CODE HERE
        #print(vgg16)
        #print(len(vgg16.features))
       
        # All the features of the VGG 16
        encoder = list(models.vgg16().children())[0][:len(vgg16.features)]
        # First convolution layer, encoder[0]
        self.init = encoder[0:4]        
        # First max pool layer, encoder[4]
        self.maxpool = encoder[4]
        ###
        
        ############################################################################################
        # Write code for encoder blocks:
        # Sub-sequent convolution layers and max-pool combined will be used as encoder blocks. 
        # Let's encoder blocks is named as self.layer1, self.layer2, self.layer3, and self.layer4 for 
        # encoder block1, encoder block2, encoder block3, and encoder block4 respectively.
        ############################################################################################
        
        self.layer1 = None
        self.layer2 = None
        self.layer3 = None
        self.layer4 = None
        
        ###
        ### YOUR CODE HERE
        self.layer1 = encoder[5:10]
        self.layer2 = encoder[10:17]
        self.layer3 = encoder[17:24]
        self.layer4 = encoder[24:31]    
        ###

        
        #############################################################################################
        # Decoder's block: DecoderBlock module
        
        # Write code for decoder block here. As DecoderBlock class is already defined, you have to 
        # initiate the class with arguments channels_in and channels_out. 
        
        # Let's decoder block as self.up4, self.up3, self.up2, self.up1 for decoder block4, 
        # decoder block3, decoder block2, and decoder block1 respectively. 
        #############################################################################################
        
        self.up4 = None
        self.up3 = None
        self.up2 = None
        
        # output channel of self.up1 must be 64
        self.up1 = None

        ###
        ### YOUR CODE HERE        
        
        self.up4 = DecoderBlock(channels_in=512, channels_out=512)
        self.up3 = DecoderBlock(channels_in=512, channels_out=256)
        self.up2 = DecoderBlock(channels_in=256, channels_out=128)
        self.up1 = DecoderBlock(channels_in=128, channels_out=64)
        ###

        # Classification block: define a classifier module
        
        ################################################################################################
        # You have to write the classifier part as a Sequential model.
        ################################################################################################
        self.classifier = nn.Sequential(
            ###
            ### YOUR CODE HERE
            # deconvolution layer
            nn.ConvTranspose2d(64, 32, 3, stride=2, bias=False),
            # batch normalization with num_features = 32
            nn.BatchNorm2d(32),
            # activation function
            nn.ReLU(),
            # convolutional layer
            nn.Conv2d(32, 32, kernel_size=3, padding=1, bias=False),
            # batch normalization with num_features = 32
            nn.BatchNorm2d(32),
            # activation function
            nn.ReLU(),
            # convolutional layer
            nn.Conv2d(32, num_classes, kernel_size=2, padding=0)          
            ###
        )        
        

    # define the forward pass
    def forward(self, x):
        
        #############################################################################################
        # You have to complete the forward method for LinkNet.
        #############################################################################################
        
        # for input image size (3, 320, 320)

        # output size = (64, 320, 320)
        init = self.init(x)
        # output size = (64, 160, 160)
        maxpool = self.maxpool(init)
        
        ###
        ### YOUR CODE HERE        
        layer1 = self.layer1(maxpool)
        layer2 = self.layer2(layer1)
        layer3 = self.layer3(layer2)
        layer4 = self.layer4(layer3)
        up4 = self.up4(layer4) + layer3
        up3 = self.up3(up4) + layer2
        up2 = self.up2(up3) + layer1
        up1 = self.up1(up2)
        ###

        # output size = (5, 320, 320), where 5 is the predefined number of classes
        output = self.classifier(up1)

        return output


# LinkNet architecture
#model = LinkNet(num_classes=5, encoder="vgg16")

# <font style="color:green">4. Check the implementation</font>

## <font style="color:green">4.1. Check Encoder-Decoder Implementation with Model Profiler</font>
Verify your encoder-decoder implementation with the model profiler before submitting it. 
Here, we will check the number of floating points operation and the number of parameters. 

In [18]:
class ModelProfiler(nn.Module):
    """ Profile PyTorch models.

    Compute FLOPs (FLoating OPerations) and number of trainable parameters of model.

    Arguments:
        model (nn.Module): model which will be profiled.

    Example:
        model = torchvision.models.resnet50()
        profiler = ModelProfiler(model)
        var = torch.zeros(1, 3, 224, 224)
        profiler(var)
        print("FLOPs: {0:.5}; #Params: {1:.5}".format(profiler.get_flops('G'), profiler.get_params('M')))

    Warning:
        Model profiler doesn't work with models, wrapped by torch.nn.DataParallel.
    """
    def __init__(self, model):
        super().__init__()
        self.model = model
        self.flops = 0
        self.units = {'K': 10.**3, 'M': 10.**6, 'G': 10.**9}
        self.hooks = None
        self._remove_hooks()

    def get_flops(self, units='G'):
        """ Get number of floating operations per inference.

        Arguments:
            units (string): units of the flops value ('K': Kilo (10^3), 'M': Mega (10^6), 'G': Giga (10^9)).

        Returns:
            Floating operations per inference at the choised units.
        """
        assert units in self.units
        return self.flops / self.units[units]

    def get_params(self, units='G'):
        """ Get number of trainable parameters of the model.

        Arguments:
            units (string): units of the flops value ('K': Kilo (10^3), 'M': Mega (10^6), 'G': Giga (10^9)).

        Returns:
            Number of trainable parameters of the model at the choised units.
        """
        assert units in self.units
        params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
        if units is not None:
            params = params / self.units[units]
        return params

    def forward(self, *args, **kwargs):
        self.flops = 0
        self._init_hooks()
        output = self.model(*args, **kwargs)
        self._remove_hooks()
        return output

    def _remove_hooks(self):
        if self.hooks is not None:
            for hook in self.hooks:
                hook.remove()
        self.hooks = None

    def _init_hooks(self):
        self.hooks = []

        def hook_compute_flop(module, _, output):
            self.flops += module.weight.size()[1:].numel() * output.size()[1:].numel()

        def add_hooks(module):
            if isinstance(module, (nn.Conv2d, nn.Linear)):
                self.hooks.append(module.register_forward_hook(hook_compute_flop))

        self.model.apply(add_hooks)

In [19]:
def profile_model(model, input_size, cuda):
    """ Compute FLOPS and #Params of the CNN.

    Arguments:
        model (nn.Module): model which should be profiled.
        input_size (tuple): size of the input variable.
        cuda (bool): if True then variable will be upload to the GPU.

    Returns:
        dict:
            dict["flops"] (float): number of GFLOPs.
            dict["params"] (int): number of million parameters.
    """
    profiler = ModelProfiler(model)
    var = torch.zeros(input_size)
    if cuda:
        var = var.cuda()
    profiler(var)
    return {"flops": profiler.get_flops('G'), "params": profiler.get_params('M')}

In [20]:
def GFLOPs_and_Parms_count_summary(block, input_tensor, block_name):
    
    flops, params = profile_model(block, input_tensor.size(), False).values()
    
    print('{0}\n{1}\n{0}'.format('-'*50, block_name))

    print('GFLOPs:\t\t\t\t{}\nNo. of params (in million):\t{}'.format(flops, params))
    
    return block(input_tensor)

    

**Running the below cell, you should get the following outputs:**

```
--------------------------------------------------
init
--------------------------------------------------
GFLOPs:				        3.9518208
No. of params (in million):	0.03872
--------------------------------------------------
maxpool
--------------------------------------------------
GFLOPs:				        0.0
No. of params (in million):	0.0
--------------------------------------------------
layer1
--------------------------------------------------
GFLOPs:				        5.6623104
No. of params (in million):	0.22144
--------------------------------------------------
layer2
--------------------------------------------------
GFLOPs:				        9.437184
No. of params (in million):	1.475328
--------------------------------------------------
layer3
--------------------------------------------------
GFLOPs:				        9.437184
No. of params (in million):	5.899776
--------------------------------------------------
layer4
--------------------------------------------------
GFLOPs:				        2.8311552
No. of params (in million):	7.079424
--------------------------------------------------
up4
--------------------------------------------------
GFLOPs:				        0.065536
No. of params (in million):	0.265216
--------------------------------------------------
up3
--------------------------------------------------
GFLOPs:				        0.1572864
No. of params (in million):	0.199168
--------------------------------------------------
up2
--------------------------------------------------
GFLOPs:				        0.1572864
No. of params (in million):	0.050432
--------------------------------------------------
up1
--------------------------------------------------
GFLOPs:				        0.1572864
No. of params (in million):	0.012928
```


In [21]:
# input data for model check
input_tensor = torch.zeros(1, 3, 320, 320)

# LinkNet architecture
model = LinkNet(num_classes=5, encoder="vgg16")

input_tensor = GFLOPs_and_Parms_count_summary(model.init, input_tensor, 'init')
input_tensor = GFLOPs_and_Parms_count_summary(model.maxpool, input_tensor, 'maxpool')
input_tensor = GFLOPs_and_Parms_count_summary(model.layer1, input_tensor, 'layer1')
input_tensor = GFLOPs_and_Parms_count_summary(model.layer2, input_tensor, 'layer2')
input_tensor = GFLOPs_and_Parms_count_summary(model.layer3, input_tensor, 'layer3')
input_tensor = GFLOPs_and_Parms_count_summary(model.layer4, input_tensor, 'layer4')
input_tensor = GFLOPs_and_Parms_count_summary(model.up4, input_tensor, 'up4')
input_tensor = GFLOPs_and_Parms_count_summary(model.up3, input_tensor, 'up3')
input_tensor = GFLOPs_and_Parms_count_summary(model.up2, input_tensor, 'up2')
input_tensor = GFLOPs_and_Parms_count_summary(model.up1, input_tensor, 'up1')

--------------------------------------------------
init
--------------------------------------------------
GFLOPs:				3.9518208
No. of params (in million):	0.03872
--------------------------------------------------
maxpool
--------------------------------------------------
GFLOPs:				0.0
No. of params (in million):	0.0
--------------------------------------------------
layer1
--------------------------------------------------
GFLOPs:				5.6623104
No. of params (in million):	0.22144
--------------------------------------------------
layer2
--------------------------------------------------
GFLOPs:				9.437184
No. of params (in million):	1.475328
--------------------------------------------------
layer3
--------------------------------------------------
GFLOPs:				9.437184
No. of params (in million):	5.899776
--------------------------------------------------
layer4
--------------------------------------------------
GFLOPs:				2.8311552
No. of params (in million):	7.079424
--------------

In [22]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###


## <font style="color:green">4.2. Check classifier & forward Implementation</font>

**Input width and height is the same as output width and height because semantic segmentation predicts the label of each pixel.**

**Running the below cell, you should get the following outputs:**
```
Prediction Size: torch.Size([1, 5, 320, 320])
```

In [23]:
# input data for model check
input_tensor = torch.zeros(1, 3, 320, 320)

# LinkNet architecture
model = LinkNet(num_classes=5, encoder="vgg16")

# examining the prediction size
pred = model(input_tensor)
print('Prediction Size: {}'.format(pred.size()))

Prediction Size: torch.Size([1, 5, 320, 320])


In [None]:
###
### AUTOGRADER TEST - DO NOT REMOVE
###
