# **Final Exam for Deep Network Development course**
Name:

Neptun ID:

Date: 27/05/2024

Duration: 9AM-11AM

## General rules
This notebook contains the task to be solved in order to pass the exam and complete the course.
It contains a task similar to what you have worked on during the semester, which consists on implementing a network architecture and a function. There are additional requirements which are optional.

The exam has a duration of 2 hours. You are free to distribute the time as you please between the different requirements.
During the exam you can use any resource (internet, AI, practice notebooks, etc). **However it is strictly prohibited to use any communication channel** (Teams, WhatsApp, Messenger, etc.). Using any of those will result in immediate **FAIL**.

Your solution should be submitted to Canvas as a .ipynb file!

Please note that, to **PASS** the exam you must **SUBMIT A SUCCESSFUL SOLUTION SATISFYING THE MINIMUM REQUIREMENTS**. If you **FAIL** the exam, you have the right to retry it **ONE MORE TIME**. If you **FAIL AGAIN**, then unfortunately, you have failed the course.

If you **PASS** the exam, then the final grade is the weighted average of your asignment defenses (theory and code).

## Task description & Requirements
The task is to implement a custom architecture and its forward function.
The task is inspired on Image Captioning and the architecture is a U-Net like model with an LSTM at the end. The model receives an image as input and generates text as output (actually a probability distribution over a set of tokens). For the extra part, you are required to combine different inputs and to extend the architecture.

## Requirements:
------------------------------------------------------------------------
**Minimum requirements - ENOUGH TO PASS THE EXAM**
1.   Implement the layers of the architecture shown in section 1. Fill out the unknown parts in order to complete the architecture.
2.   Implement the forward function of the architecture. Make sure that the input and output are correct.

**!!! To complete the requirements 1 and 2, your final output should match the expected output indicated on cell 1.2. !!!**

------------------------------------------------------------------------

**Extra requirements - for grade improvement and potentially access to AI Lab**
3.   Use the output embedding of the encoder implemented on cell 1.1 and fuse it with another provided embedding. --> **+1 in final grade**
4.   Modify the architecture previously implemented to accept the new fused embedding and extend the architecture with the details shown on cell 1.4. --> **Access to AI Lab**
------------------------------------------------------------------------

In [1]:
#Necessary imports
import torch
import torch.nn as nn
import numpy as np

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cpu')

## 1. Architecture

Please right click the image and "Open image in a new tab" to view it better with zoom. Or download it from here: https://drive.google.com/file/d/1P3q6dNxywzIpkmOhxhqN01I5NVlQWW7_/view?usp=sharing

<br>
<br>

![](https://drive.google.com/uc?export=view&id=1P3q6dNxywzIpkmOhxhqN01I5NVlQWW7_)

In [118]:
def get_stride_pad(input_size,output_size,kernel_size,padding):
  output_size=output_size-1
  stride=(input_size-kernel_size+(2*padding))/(output_size)
  return stride

get_stride_pad(127,127,3,1)

1.0

### 1.1. Implement the architecture and its forward function

In [114]:
class CustomNet(nn.Module):
    def __init__(self):
        super(CustomNet, self).__init__()
        self.upper_1=nn.Conv2d(3,16,kernel_size=3,stride=1,padding=0)
        self.upper_11=nn.MaxPool2d(kernel_size=2,stride=2)
        self.upper_2=nn.Conv2d(3,16,kernel_size=3,stride=1,padding=0)
        self.upper_22=nn.Conv2d(16,16,kernel_size=3,stride=2,padding=1)

        self.layer_3=nn.Sequential(nn.Conv2d(16,32,kernel_size=3,stride=1,padding='same'),
                                   nn.ReLU())
        self.layer_4=nn.Conv2d(16,32,kernel_size=3,stride=1,padding='same')
        self.layer_44=nn.Conv2d(32,32,kernel_size=3,stride=1,padding=1)

        #Define layers
        self.layer_5=nn.Sequential(nn.Conv2d(64,128,kernel_size=5,stride=2,padding=0),
                                   nn.BatchNorm2d(128))
        #bottleneck
        self.layer_6=nn.Sequential(nn.ConvTranspose2d(128,64,kernel_size=3,stride=1,padding=0),
                                   nn.ReLU())
        self.layer_66=nn.ConvTranspose2d(64,64,kernel_size=3,stride=2,padding=1)
        #upsampling
        self.decoder=nn.Sequential(nn.ConvTranspose2d(64,3,kernel_size=4,stride=2,padding=0),
                                   nn.Conv2d(3,16,kernel_size=11,stride=8,padding=0))

    def forward(self, x):
        print("Input:", x.shape)
        print()
        x1=self.upper_1(x)
        print("Upper_1:", x1.shape)
        print()
        x1=self.upper_11(x1)
        print("Upper_11:", x1.shape)
        print()
        x2=self.upper_2(x)
        print("Upper_2:", x2.shape)
        print()
        x2=self.upper_22(x2)
        print("Upper_22:", x2.shape)
        print()
        x3=x1+x2
        print("Concat:", x3.shape)
        print()
        x4=self.layer_3(x3)
        print("Layer_3:", x4.shape)
        print()
        x5=self.layer_4(x3)
        print("Layer_4:", x5.shape)
        print()
        x5=self.layer_44(x5)
        print("Layer_44:", x5.shape)
        print()
        x6=torch.cat([x4,x5], dim=1)
        print("Concat2:", x6.shape)
        print()
        x7=self.layer_5(x6)
        print("Layer_5:", x7.shape)
        print()
        x8=self.layer_6(x7)
        print("Layer_6:", x8.shape)
        print()
        x8=self.layer_66(x8)
        print("Layer_66:", x8.shape)
        print()


        x9=x8+x6
        print("Concat:", x9.shape)
        print()
        x10=self.decoder(x9)
        print("Decoder:", x10.shape)
        print()
        x11 = x10.permute(0, 2, 3, 1)     # [B, H, W, C] = [1, 15, 15, 16]
        x11 = x11.reshape(x10.size(0), -1, x10.size(1))  # [1, 225, 16]
        print("Reshape:", x11.shape)
        print()
        lstm=nn.LSTM(16,hidden_size=64,num_layers=1,batch_first=True)
        x12,_=lstm(x11)
        print("LSTM:", x12.shape)
        print()
        softmax=nn.Softmax2d()
        x13=softmax(x12)
        print("LSTM:", x13)
        out=x13
        # Define the encoder part
        print("Encoder")
        print()

        #Define the decoder part
        print("Decoder")
        print()

        # PLEASE NAME YOUR FINAL OUTPUT AS 'out' OR RENAME THE VARIABLE IN THE RETURN

        return out

### 1.2 Check if your implementation is correct
For a given arbitraty input of size (1,3,256,256) the expected output is (1,961,64)

In [115]:
image = torch.tensor(np.random.rand(3,256,256), dtype=torch.float32)
image = torch.unsqueeze(image, dim=0)

model = CustomNet()
output = model(image)
print()

try:
    assert output.shape == torch.Size([1, 961, 64])
    print("CONGRATULATIONS! You have PASSed the exam by successfully completing the minimum requirements!")
except AssertionError:
    print("Unfortunately, you have FAILed the exam by not being able to complete the minimum requirements.")

Input: torch.Size([1, 3, 256, 256])

Upper_1: torch.Size([1, 16, 254, 254])

Upper_11: torch.Size([1, 16, 127, 127])

Upper_2: torch.Size([1, 16, 254, 254])

Upper_22: torch.Size([1, 16, 127, 127])

Concat: torch.Size([1, 16, 127, 127])

Layer_3: torch.Size([1, 32, 127, 127])

Layer_4: torch.Size([1, 32, 127, 127])

Layer_44: torch.Size([1, 32, 127, 127])

Concat2: torch.Size([1, 64, 127, 127])

Layer_5: torch.Size([1, 128, 62, 62])

Layer_6: torch.Size([1, 64, 64, 64])

Layer_66: torch.Size([1, 64, 127, 127])

Concat: torch.Size([1, 64, 127, 127])

Decoder: torch.Size([1, 16, 31, 31])

Reshape: torch.Size([1, 961, 16])

LSTM: torch.Size([1, 961, 64])

LSTM: tensor([[[1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         ...,
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.],
         [1., 1., 1.,  ..., 1., 1., 1.]]], grad_fn=<SoftmaxBackward0>)
Encoder

Decoder


CONGRATULATIONS! You hav

In [48]:
import torch.nn.functional as F

# EXTRA REQUIREMENTS (OPTIONAL)

### 1.3. Fuse the embeddings
First obtain the image embeddings from the last layer of the encoder from the previously implemented architecture. Then combine it with the new provided embedding.

Please right click the image and "Open image in a new tab" to view it better with zoom. Or download it from here: https://drive.google.com/file/d/1pts-Dzka5fYD6clW3Pb1NCgEbPWZKF74/view?usp=sharing

<br>
<br>

![](https://drive.google.com/uc?export=view&id=1pts-Dzka5fYD6clW3Pb1NCgEbPWZKF74)

In [None]:
class CombineEmbeddings(nn.Module):
    def __init__(self, embed_dim, comb_dim):
        super(CombineEmbeddings, self).__init__()
        self.embed_dim = embed_dim
        self.comb_dim = comb_dim

    def forward(self, embedding_1, embedding_2):
        #Reshaping might be needed

        return attention_output

In [None]:
#REPLACE THIS WITH ACTUAL EMBEDDING FROM ENCODER OUTPUT OF PREVIOUSLY IMPLEMENTED ARCHITECTURE.
#FOR SIMPLICITY REMOVE BATCH SIZE - squeeze
encoder_embedding = torch.randn(128, 62, 62) #REPLACE THIS!!!!

#NEW EMBEDDING
new_embedding = torch.randn(128, 62, 62)

#combine embeddings according to the figure in 1.3.
combine = CombineEmbeddings(embed_dim=128, comb_dim=128) #for simplicity use the same size for the combination 128
output = combine(encoder_embedding, new_embedding)
print()

try:
    assert output.shape == torch.Size([128, 62, 62])
    print("CONGRATULATIONS! You have earned +1 in your final grade by successfully satisfying one of the extra requirements!")
except AssertionError:
    print("Sorry! Keep trying!")

### 1.4. Modify and extend the architecture
Remove all the existing decoder layers and use just a Transformer based decoder. The input should be the new combined embedding.

In [None]:
print("If you have completed this part, then CONGRATULATIONS! You will be suggested as a potential student to join the AI Lab!")

If you have completed this part, then CONGRATULATIONS! You will be suggested as a potential student to join the AI Lab!
