In [11]:
#I defined the class for the MultiLayer Perceptron, we have 4 hidden layers and one output layer, for each hidden layer the activation function is the Rectified Linear Unit whereas the activation function for the last layer is softmax, due to the fact the we are facing  multi-class classification problem.

import torch
class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = torch.nn.Linear(in_features=5, out_features=11, bias = False)
        self.layer2 = torch.nn.Linear(in_features=11, out_features=16, bias = False)
        self.layer3 = torch.nn.Linear(in_features=16, out_features=13, bias = False)
        self.layer4 = torch.nn.Linear(in_features=13, out_features=8, bias = False)
        self.layer5 = torch.nn.Linear(in_features=8, out_features=4, bias = False)

    def forward(self, X):
        out = self.layer1(X)
        out = torch.nn.functional.relu(out)
        out = self.layer2(out)
        out = torch.nn.functional.relu(out)
        out = self.layer3(out)
        out = torch.nn.functional.relu(out)
        out = self.layer4(out)
        out = torch.nn.functional.relu(out)
        out = self.layer5(out)
        out = torch.nn.functional.softmax(out)
        return out


In [16]:
#Here I instatiate the model
model = MLP()

In [13]:
#Here we have the summary of the model, and we can see the number of parameters for each layer-layer couple
from torchsummary import summary
summary(model)

Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0


Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0

The number of parameters for each layer-by-layer couple without bias could be easily computated knowing the number of neurons of the left-side layer and the number of neurons of the right-side layer, then the formulation become:

$N_{parameters-layer-by-layer} = N_{neurons-left}*N_{neurons-right}$

E.G. $N_{parameters-layer-1-2} = 11 * 16 = 176$

If we also have the bias term in the right-side layer the formulation become:

$N_{parameters-layer-by-layer} = N_{neurons-left}*N_{neurons-right} + N_{neurons-right}$

Indeed for each neuron on the right-side we have a bias term


In [19]:
#Below I printed the L1 and L2 norm for each set of paramaters exploiting the state_dict method and the torch.norm function
from sys import stdout
stdout.write('\t       L1 Norm')
stdout.write('  L2 Norm')
stdout.write('\n')
for param_name, param in model.state_dict().items():
    stdout.write(f'{param_name}')
    for i in range(1,3):
        T = torch.norm(param,i)
        stdout.write('\t{:.3f}'.format(T)) 
        stdout.flush()
    stdout.write('\n')

	       L1 Norm  L2 Norm
layer1.weight	12.464	1.956
layer2.weight	27.184	2.360
layer3.weight	25.766	2.053
layer4.weight	15.898	1.749
layer5.weight	6.120	1.232
