These are the "official" result for this HW session.
As I cited multiple times during lab, Python and PyTorch offer multiple solutions to many tasks, hence things may be done differently than I show here.

In [6]:
import torch
import torchsummary

**Ex. 1**: implement the neural network from the figure. No bias terms, relu activation for hidden layers, softmax for output.

In [8]:
class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # connections from input layer leading to first layer
        self.layer1 = torch.nn.Linear(in_features=5, out_features=11, bias=False)
        self.layer2 = torch.nn.Linear(in_features=11, out_features=16, bias=False)
        self.layer3 = torch.nn.Linear(in_features=16, out_features=13, bias=False)
        self.layer4 = torch.nn.Linear(in_features=13, out_features=8, bias=False)
        # connections leading to output layer
        self.layer5 = torch.nn.Linear(in_features=8, out_features=4, bias=False)

    def forward(self, X):
        out = self.layer1(X)
        out = torch.nn.functional.relu(out)
        out = self.layer2(out)
        out = torch.nn.functional.relu(out)
        out = self.layer3(out)
        out = torch.nn.functional.relu(out)
        out = self.layer4(out)
        out = torch.nn.functional.relu(out)
        out = self.layer5(out)
        out = torch.nn.functional.softmax(out, dim=1) # note: works also without specifying the dim, but raises a deprecation warning
        return out

**Ex. 2**: print a summary of the net

In [9]:
net = MLP()
_ = torchsummary.summary(net) # note: "_ =" avoids the double printing of the output

Layer (type:depth-idx)                   Param #
├─Linear: 1-1                            55
├─Linear: 1-2                            176
├─Linear: 1-3                            208
├─Linear: 1-4                            104
├─Linear: 1-5                            32
Total params: 575
Trainable params: 575
Non-trainable params: 0


**Ex. 3:** calculation of the no. of parameter for both the biasless and the not biasless case

a) The biasless case is straightforward: for each layer (except the input), we multiply the no. of neurons for the no. of neurons of the previous layer. This gives

$11*5+16*11+13*16+8*13+4*8 = 575$ as in the output above

b) The not-biasless case is also straightforward given a): we just need to add the number of neurons to each of the layers with incoming connections (i.e., except the input), as there is one bias term per neuron with incoming connection (*note, also the output layer since it has incoming connections!*)

$575 + 11+16+13+8+4 = 627$

Also, can be done like so:

$(\text{no. neurons}_{l-1} + 1) \cdot \text{no. neurons}_l,~~l\in\{1,\dots 5\}$ (here the layer 0 is the input layer of course)

**Ex. 4**: printing norms of the parameters

This exercise had multiple solutions depending on the type of norm used.

The parameters are organized in matrices, so with L-1 and L-2 norm there can be some confusion about whether to use:

- vector norms for "unrolled"/"flattened" matrices (`torch.norm`)
- matrix 1-norm and 2-norm (`torch.linalg.norm`)

Nonetheless, both solutions are good since the goal of the exercise was to put together the analysis of the `state_dict` with some easy lineal algebra capabilities of PT.

In [17]:
print("Param name", "\t", "L1 norm", "\t", "L2 norm")
for param_name, param in net.state_dict().items(): # also good: "for param_name, param in net.named_parameters()" OR "for param in parameters()"...
    print(param_name, param.norm(1), param.norm(2)) # this returns the vector L1-L2-norm
    print(param_name, torch.linalg.norm(param, 1), torch.linalg.norm(param, 2)) # this return the matricial 1- and 2-norm -- good as well
    # note: if you use torch.linalg.norm(param), this returns the Frobenius matrix norm which is equivalent to the vector L2-norm

Param name 	 L1 norm 	 L2 norm
layer1.weight tensor(13.3416) tensor(2.0372)
layer1.weight tensor(3.2201) tensor(1.3089)
layer2.weight tensor(25.5253) tensor(2.2320)
layer2.weight tensor(2.8155) tensor(1.0845)
layer3.weight tensor(25.6307) tensor(2.0681)
layer3.weight tensor(2.2152) tensor(0.9858)
layer4.weight tensor(14.5009) tensor(1.6472)
layer4.weight tensor(1.3541) tensor(0.9130)
layer5.weight tensor(5.3351) tensor(1.0939)
layer5.weight tensor(0.9057) tensor(0.8626)


Note: if you wish to get rid of the singleton tensor, just append `.item()`.
Don't do something like `.detach().numpy()` because you're basically wasting computation in converting the tensor to numpy, then to scalar.