### Deep Learning Homework 1

1) Taking inspiration from the notebook `01-intro-to-pt.ipynb`, build a class for the Multilayer Perceptron (MLP) whose scheme is drawn in the last figure of the notebook. As written there, no layer should have bias units and the activation for each hidden layer should be the Rectified Linear Unit (ReLU) function, also called ramp function. The activation leading to the output layer, instead, should be the softmax function, which prof. Ansuini explained during the last lecture. You can find some notions on it also on the notebook.

2) After having defined the class, create an instance of it and print a summary using a method of your choice.

3) Provide detailed calculations (layer-by-layer) on the exact number of parameters in the network.  
    3.1)Provide the same calculation in the case that the bias units are present in all layers (except input).

4) For each layer within the MLP, calculate the L2 norm and L1 norm of its parameters.

In [1]:
import torch
from torch import nn
from torchsummary import summary

In [2]:
# 1)

class MLP(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.layer1 = nn.Linear(5, 11, bias=False)
        self.layer2 = nn.Linear(11, 16, bias=False)
        self.layer3 = nn.Linear(16, 13, bias=False)
        self.layer4 = nn.Linear(13, 8, bias=False)
        self.output = nn.Linear(8, 4, bias=False)
        
    def forward(self, X):
        relu = nn.functional.relu
        softmax = nn.functional.softmax
        
        out = relu(self.layer1(X))
        out = relu(self.layer2(out))
        out = relu(self.layer3(out))
        out = relu(self.layer4(out))
        out = softmax(self.output(out), dim=1)
    
        return out

In [3]:
# 2)

# Instantiate and test
mlp = MLP()
X = torch.randn((3, 5))
mlp(X)

# Summary of the model
summary(mlp)

Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0


Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0

3) In fully connected layers each neuron of layer $i$ is connected to each neuron of layer $i+1$, so if we define $n_i$ as the number of neurons of layer $i$ each neuron of the this layer will have $n_{i+1}$ connections and since we have $n_i$ neurons this will result in a $n_i \cdot n_{i+1}$ total number of connections and since each connections represents a parameter we will have $n_i \cdot n_{i+1}$ parameters. In our network the number of parameters will be:  
* input to layer1: 5*11 = 55
* layer1 to layer2: 11*16 = 176
* layer2 to layer3: 16*13 = 208
* layer3 to layer4: 13*8 = 104
* layer4 to output: 8*4 = 32

Which corresponds to the summary of our model

3.1) In case we have a bias the number of parameters will increase, in particular we will have one additional parameter for each neuron in the $n_{i+1}$ layer, which corresponds to $n_i \cdot n_{i+1} + n_{i+1} = (n_i+1) \cdot n_{i+1}$ parameters, we can view this as adding an additional neuron in the $n_i$ layer with value 1. In our network this will correspond to:

* input to layer1: (5+1)*11 = 66
* layer1 to layer2: (11+1)*16 = 192
* layer2 to layer3: (16+1)*13 = 221
* layer3 to layer4: (13+1)*8 = 112
* layer4 to output: (8+1)*4 = 36

In [4]:
# 4)

for param_name, param in mlp.state_dict().items():
    print(param_name, "L2 norm:", torch.norm(param, 2), "L1 norm:", torch.norm(param, 1))

layer1.weight L2 norm: tensor(1.9684) L1 norm: tensor(12.6293)
layer2.weight L2 norm: tensor(2.4287) L1 norm: tensor(28.1345)
layer3.weight L2 norm: tensor(2.0713) L1 norm: tensor(25.6020)
layer4.weight L2 norm: tensor(1.5003) L1 norm: tensor(13.1404)
output.weight L2 norm: tensor(1.0281) L1 norm: tensor(4.8594)
