# Part 1

## Lecture 1
**Multi-layered perceptron** (Deep Feed forward network):
- Depth: input (1), hidden layer(s), output (1)
    - Ex: 3
- (learnable) parameters: weight & bias
    - Ex: 9 (arrows & activation functions)
- Update weights by minimizing a cost/loss/error-function
    - MSE, CE
    - Ex: reduce by moving in opposite sign of derivative (Gradient Descent)
- Gradient is vector of all partial derivatives -> all weights
    - Mostly done in batches, since huge datasets a step can take too long
    - Ex: SGD is approximation of the gradient from (small) number of samples
        - smaller batches: more noise
        - SGD is more computationally efficient
        - stochastically chosen samples
- Decision boundary might not be linearly separable, thus make it non-linear:
    - Generic kernel function
    - Designing feature extractors
    - Learn it ( add knowledge to restrict 1, more flexible than 2)
    - Use non-linear functions
        - ReLU, Sigmoid, Max, ...

<img src="./image/Multilayeredperceptron.png" height="250" />

## [Assignment 1: Forward pass](https://colab.research.google.com/drive/1cXVLmuFp24zI35OtKSEntARTGg0YHEyq)

In the forward pass it only computes the **loss and activations**.

*in_features = 7, out_features = 5*

<img src="./image/fully_connected_layer.png" height="250" />

- **in** = (n_samples, in_features)
- Linear (fully connected) layer = (in_features, out_features)
    - Weight = (in_features, out_features)
    - Bias = (out_features)
- **out** = (n_samples, out_features)

Non-linear Functions:
- ReLU(x) = max(0, x) 
    - [0, $$+\infty$$)
- Sigmoid(x) = $$\sigma$$(x) = $$\frac{1}{1 + exp(-x)}$$ 
    - \[0, 1]

Loss Functions:
- $$\text{MSE}(\hat{y},y)= \frac{1}{n} \sum_{i=1}^n (\hat{y}-y)^2 $$
    - unbounded and minimize error
    - used for regression

In [None]:
import torch
import torch.nn as nn
from torchinfo import summary

nr_samples = 3
in_features = 7
out_features = 5 # Also bias size
bias=False

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(in_features, out_features, bias=bias)

    def forward(self, x):
        return self.fc1(x)


model_ouput = summary(
    Net(), 
    (in_features,),
    verbose=2,
    col_width=16,
    col_names=["kernel_size", "input_size", "output_size", "num_params"])

Layer (type:depth-idx)                   Kernel Shape     Input Shape      Output Shape     Param #
├─Linear: 1-1                            [7, 5]           [7]              [5]              35
Total params: 35
Trainable params: 35
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00


Another example:
$$$
w^T\text{max}(0, W^Tx+c) + b
$$$

In [None]:
import torch
import torch.nn as nn
from torchinfo import summary

nr_samples = 3
in_features = 3
hidden_size = 4
out_features = 1
bias=True

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(in_features, hidden_size, bias=bias)
        self.relu = nn.ReLU(inplace=True)
        self.fc2 = nn.Linear(hidden_size, out_features, bias=bias)

    def forward(self, x):
        return self.fc2(self.relu(self.fc1(x)))


model_ouput = summary(
    Net(), 
    (in_features,),
    verbose=2,
    col_width=16,
    col_names=["kernel_size", "input_size", "output_size", "num_params"])

Layer (type:depth-idx)                   Kernel Shape     Input Shape      Output Shape     Param #
├─Linear: 1-1                            [3, 4]           [3]              [4]              16
├─ReLU: 1-2                              --               [4]              [4]              --
├─Linear: 1-3                            [4, 1]           [4]              [1]              5
Total params: 21
Trainable params: 21
Non-trainable params: 0
Total mult-adds (M): 0.00
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=de0be7a9-29e1-4ab6-9ce7-607fa646094e' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>