# Deep Learning - Homework 01

### 1. Build the MLP using PT built-ins as in [lab-01](../labs/01-intro-to-pt.ipynb)

Let's summarize the structure of the MLP model:


**Layer 1**:  
* $ z^{(1)} = W_1  x$,  where $x \in  \mathbb{R}^{5} $ and $W_1 \in \mathbb{R}^{11 \times 5}$
* $ a^{(1)} = h^{(1)}(z^{(1)})$, where $h^{(1)}(x) = ReLU(x)$


**Layer 2**:  
* $ z^{(2)} = W_2  a^{(1)}$,  where $a^{(1)} \in \mathbb{R}^{11}$ and $W_2 \in \mathbb{R}^{16 \times 11}$
* $ a^{(2)} = h^{(2)}(z^{(2)})$, where $h^{(2)}(x) = ReLU(x)$

**Layer 3**:  
* $ z^{(3)} = W_3  a^{(2)}$,  where $a^{(2)} \in \mathbb{R}^{16}$ and $W_3 \in \mathbb{R}^{13 \times 16}$
* $ a^{(3)} = h^{(3)}(z^{(3)})$, where $h^{(3)}(x) = ReLU(x)$


**Layer 4**:  
* $ z^{(4)} = W_4  a^{(3)}$,  where $a^{(3)} \in \mathbb{R}^{13}$ and $W_4 \in \mathbb{R}^{8 \times 13}$
* $ a^{(4)} = h^{(4)}(z^{(4)})$, where $h^{(4)}(x) = ReLU(x)$

**Layer 5**:  
* $ z^{(5)} = W_5  a^{(4)}$,  where $a^{(4)} \in \mathbb{R}^{8}$ and $W_5 \in \mathbb{R}^{4 \times 8}$
* $ y = h^{(5)}(z^{(5)})$, where $h^{(5)}(x) = softmax(x)$

In [12]:
import torch

class MLP(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.layer1 = torch.nn.Linear(in_features=5, out_features=11,  bias=False)
    self.layer2 = torch.nn.Linear(in_features=11, out_features=16, bias=False)
    self.layer3 = torch.nn.Linear(in_features=16, out_features=13, bias=False)
    self.layer4 = torch.nn.Linear(in_features=13, out_features=8,  bias=False)
    self.outlayer = torch.nn.Linear(in_features=8, out_features=4,  bias=False)
    
  def forward(self, x):
    out = self.layer1(x)
    out = torch.nn.functional.relu(out)
    out = self.layer2(out)
    out = torch.nn.functional.relu(out)
    out = self.layer3(out)
    out = torch.nn.functional.relu(out)
    out = self.layer4(out)
    out = torch.nn.functional.relu(out)
    out = self.outlayer(out)
    out = torch.nn.functional.softmax(out)
    return out


### 2. Instantiate and summarise the MLP model built with PT

In [13]:
model = MLP()
model

MLP(
  (layer1): Linear(in_features=5, out_features=11, bias=False)
  (layer2): Linear(in_features=11, out_features=16, bias=False)
  (layer3): Linear(in_features=16, out_features=13, bias=False)
  (layer4): Linear(in_features=13, out_features=8, bias=False)
  (outlayer): Linear(in_features=8, out_features=4, bias=False)
)

In [5]:
import sys
!{sys.executable} -m pip install torch-summary #how to use pip or conda in jupyter notebooks
from torchsummary import summary

Collecting torch-summary
  Downloading https://files.pythonhosted.org/packages/ca/db/93d18c84f73b214acfa4d18051d6f4263eee3e044c408928e8abe941a22c/torch_summary-1.4.5-py3-none-any.whl
Installing collected packages: torch-summary
Successfully installed torch-summary-1.4.5


In [6]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
summary(model, input_size=(1,5))

Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0


Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0

### 3. Provide calculation for the exact number of parameters of the MLP (also in the case of bias terms)

The total number of parameters of the MLP is given by the total number of entries of the matrices $W_1, W_2, W_3, W_4, W_5$, that is 


$ N^{\text{w/o  bias}}_{\text{parameters}} = (11 \cdot 5) + (16 \cdot 11 ) + (13 \cdot 16) + (8 \cdot 13) + (4 \cdot 8) = 575$


If we would have considered bias terms into our MLP model, we would require additional vectors of parameters $b_1, b_2, b_3, b_4, b_5$ of dimension equal to the output dimension of each layer. In that case the number of parameters will then be


$ N^{\text{w/ bias}}_{\text{parameters}} = (11 \cdot 5 + 11) + (16 \cdot 11 + 16) + (13 \cdot 16 + 13) + (8 \cdot 13 + 8) + (4 \cdot 8 + 4) = 627$



### 4. Calculate the L1 and L2 norm of parameters for the params of each layer



In [10]:
for index, module in enumerate(model.children()):

  with torch.no_grad():
    w = module.weight
    norm_w = torch.linalg.norm(w, ord=2).item()
    print(f'Layer {index+1}\nL2norm(w) = {norm_w:.3f}')
    norm_w = torch.linalg.norm(w, ord=1).item()
    print(f'L1norm(w) = {norm_w:.3f}\n')

Layer 1
L2norm(w) = 1.235
L1norm(w) = 3.603

Layer 2
L2norm(w) = 1.102
L1norm(w) = 3.023

Layer 3
L2norm(w) = 0.904
L1norm(w) = 1.829

Layer 4
L2norm(w) = 0.880
L1norm(w) = 1.317

Layer 5
L2norm(w) = 0.748
L1norm(w) = 1.117

