Let us suppose we wish to build a larger model from the graph below.

![](imgs/01/mlp_graph_larger.jpg)

We suppose that

1. The layers have no bias units
2. The activation function for hidden layers is `ReLU`

    $ \text{ReLU}(x) = \max(0, x)$

Moreover, we suppose that this is a classification problem.

As you might recall, when the number of classes is > 2, we encode the problem in such a way that the output layer has a no. of neurons corresponding to the no. of classes. Doing so, we establish a correspondence between output units and classes. The value of the $j$-th neuron represents the **confidence** of the network in assigning a given data instance to the $j$-th class.

Classically, when the network is encoded in such way, the activation function for the final layer is the **softmax** function.
If $C$ is the total number of classes,

$softmax(z_j) = \frac{\exp(z_j)}{\sum_{k=1}^C \exp(z_k)}$

where $j\in \{1,\cdots,C\}$ is one of the classes.

If we repeat this calculation for all $j$ s, we end up with $C$ normalized values (i.e., between 0 and 1) which can be interpreted as a confidence that the network has in assigning the instance to the corresponding class.

#### Homework

1. build the MLP in the image above using PT built-ins
2. Provide calculation for the exact number of parameters of the MLP
   - Do it first supposing that the layers don't have a bias term, then supposing that the bias is present wherever it's possible
3. Calculate the $L_1$ (vectorial) norm and the Frobenius norm for the params of each layer
4. Given 10 random datapoints, feed them into the network. This operation must be done all in one single command and must **not** make use of loops.
   - Given the output of the network, using PyTorch code, find the class of assignment of each datapoint. This also must be done in a single PyTorch command without using loops.
   - Drafting a vector of ground truths (whichever labels you like), provide code for the calculation of the accuracy
     - Tip: first get the number of correct assignments, then...

### Calculation of the number of hyperparameters:
1. We recall we have no bias, moreover our NNet is dense -> fully connected
2. Evaluation of the number of parameters:

-> 5*11 = 55 connections between the 1 st and the 2 nd layer;

-> 11*16 = 176 connections between the 2 nd and the 3 rd layer;

-> 16*13 = 208 connections between the 3 rd and the 4 th layer;

-> 13*8 = 104 connections between the 4 th and the 5 th layer;

-> 8*4 = 32 connections between the 5 th and the 6 th layer.
##### Totally, if we suppose no bias we have 575 hyperparameters

3. Now, let's suppose we have no bias:

-> (5+1)*11 = 66 connections between the 1 st and the 2 nd layer;

-> (11+1)*16 = 192 connections between the 2 nd and the 3 rd layer;

-> (16+1)*13 = 221 connections between the 3 rd and the 4 th layer;

-> (13+1)*8 = 112 connections between the 4 th and the 5 th layer;

-> (8+1)*4 = 36 connections between the 5 th and the 6 th layer.
##### Totally, if we suppose no bias we have 627 hyperparameters

In [5]:
# Building our model
import torch
from torchinfo import summary

class MLP_hw(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(5, 11, bias=False),
            torch.nn.ReLU(),
            torch.nn.Linear(11, 16, bias=False),
            torch.nn.ReLU(),
            torch.nn.Linear(16, 13, bias=False),
            torch.nn.ReLU(),
            torch.nn.Linear(13, 8, bias=False),
            torch.nn.ReLU(),
            torch.nn.Linear(8, 4, bias=False),
            torch.nn.Softmax(dim=1)
        )

    def forward(self, X):
        return self.layers(X)

my_model = MLP_hw()
summary(my_model)

Layer (type:depth-idx)                   Param #
MLP_hw                                   --
├─Sequential: 1-1                        --
│    └─Linear: 2-1                       55
│    └─ReLU: 2-2                         --
│    └─Linear: 2-3                       176
│    └─ReLU: 2-4                         --
│    └─Linear: 2-5                       208
│    └─ReLU: 2-6                         --
│    └─Linear: 2-7                       104
│    └─ReLU: 2-8                         --
│    └─Linear: 2-9                       32
│    └─Softmax: 2-10                     --
Total params: 575
Trainable params: 575
Non-trainable params: 0

In [6]:
from collections.abc import Iterable
def pretty_print(obj, title=None, norm = 'L1'):
    if title is not None:
        if isinstance(title, Iterable):
            for count, t in enumerate(title):
                print(f'Layer number {count + 1}: {t}')
                print(f'{norm} norm: {(obj[count])}')
                print('\n')
        else:    
            print(title)
            print(obj)
    return None   

# Now, let's calculate the L1 norm of the weihgts for each layer
# print([val for val in my_model.state_dict().values()])  # Complete list of weights for each layer
W_L1_norm = [torch.norm(val, 1) for val in my_model.state_dict().values()] # Full L1 norm for each layer
W_L2_norm = [torch.norm(val, 'fro') for val in my_model.state_dict().values()]
model_size = [var.size() for var in my_model.state_dict().values()]
pretty_print(W_L1_norm, model_size)
pretty_print(W_L2_norm, model_size, norm = 'Frobenius')


Layer number 1: torch.Size([11, 5])
L1 norm: 12.364631652832031


Layer number 2: torch.Size([16, 11])
L1 norm: 26.224891662597656


Layer number 3: torch.Size([13, 16])
L1 norm: 25.478038787841797


Layer number 4: torch.Size([8, 13])
L1 norm: 13.951150894165039


Layer number 5: torch.Size([4, 8])
L1 norm: 4.625356197357178


Layer number 1: torch.Size([11, 5])
Frobenius norm: 1.9393653869628906


Layer number 2: torch.Size([16, 11])
Frobenius norm: 2.3132240772247314


Layer number 3: torch.Size([13, 16])
Frobenius norm: 2.047039747238159


Layer number 4: torch.Size([8, 13])
Frobenius norm: 1.5619230270385742


Layer number 5: torch.Size([4, 8])
Frobenius norm: 0.9907125234603882




In [8]:
# Obtain the output of 10 random datapoints
out_model = my_model(torch.randn((10, 5)))

print(out_model)      # print the obtained output
print('\n')

# Obtain the most supported result by our model
val, idx = torch.max(out_model, 1)

my_dict = {0: 'Dog', 1: 'Cat', 2: 'Horse', 3: 'Bird'}
for i in range(len(val)):
    print(f'The most supported result for the input {i} is:  {my_dict[idx[i].item()]}, with probability:  {round(val[i].item(), 3)}')

tensor([[0.2502, 0.2565, 0.2445, 0.2488],
        [0.2507, 0.2519, 0.2478, 0.2496],
        [0.2500, 0.2501, 0.2500, 0.2499],
        [0.2503, 0.2515, 0.2485, 0.2497],
        [0.2490, 0.2566, 0.2474, 0.2469],
        [0.2499, 0.2515, 0.2493, 0.2492],
        [0.2499, 0.2538, 0.2470, 0.2493],
        [0.2505, 0.2518, 0.2480, 0.2497],
        [0.2503, 0.2521, 0.2481, 0.2494],
        [0.2492, 0.2521, 0.2502, 0.2485]], grad_fn=<SoftmaxBackward0>)


The most supported result for the input 0 is:  Cat, with probability:  0.256
The most supported result for the input 1 is:  Cat, with probability:  0.252
The most supported result for the input 2 is:  Cat, with probability:  0.25
The most supported result for the input 3 is:  Cat, with probability:  0.251
The most supported result for the input 4 is:  Cat, with probability:  0.257
The most supported result for the input 5 is:  Cat, with probability:  0.252
The most supported result for the input 6 is:  Cat, with probability:  0.254
The most su

In [9]:
# Here we compute the goodness of our model
def Log_Loss_acc(out_model, target):
    nll_loss = torch.nn.NLLLoss()
    return nll_loss(out_model, target)

def accuracy(out_model, target):
    return (torch.max(out_model, 1)[1] == target).sum().item()/len(target)
    
    
out_model_log = torch.log(out_model) 
out_true = torch.randint(0, 4, (10,))
print(f'Accuracy Log Loss: {Log_Loss_acc(out_model_log, out_true)}')
print(f'Standard accuracy: {accuracy(out_model, out_true)}')

# For more completeness we could also add functions to compute the ROC curve, the AUC score and so on

Accuracy Log Loss: 1.3840515613555908
Standard accuracy: 0.3
