<a href="https://colab.research.google.com/github/MMRES-PyBootcamp/MMRES-python-bootcamp2022/blob/main/13_Neural_Network101.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
from torch import nn

  from .autonotebook import tqdm as notebook_tqdm


To build a neural network in (py)torch, you are expected to fill a "template" class  
that has a `__init__` and `forward` method. This varies for different machine learning  
frameworks, but the overall logic is the same.

In [2]:
in_dim = 28*28
out_dim = 7

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.neural_network = nn.Sequential(
            nn.Linear(in_dim, 512),  # input dimension, hidden1 dimension
            nn.ReLU(),               # Non-linear activation function
            nn.Linear(512, 512),     # hidden1 dimension, hidden2 dimension
            nn.ReLU(),               # Non-linear activation function
            nn.Linear(512, out_dim), # hidden2 dimension, output dimension
            #nn.Softmax(dim=1)        # special output activation function
        )

    def forward(self, x):
        x = self.flatten(x)          # input data preparation
        output = self.neural_network(x)
        return output

In [3]:
model = NeuralNetwork()
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (neural_network): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=7, bias=True)
  )
)


For parallelization, PyTorch (and other ML frameworks) are built to process data in batches.  
Therefore, the input is always assumed to have the first dimension to be the `batch_size`.  
If we want to process a single input, we need to accont for a dummy dimension `batch_size=1`.  

In [4]:
try:
    X = torch.rand(28*28)
    output = model(X) # results in an error
except:
    X = torch.rand(1, 28*28)
    output = model(X)

In [5]:
output.shape

torch.Size([1, 7])

E1: Due to the `flatten` function in the forward pass, the input is rather flexible, try inputting alternatively shaped tensors.  
Here is one example:

In [6]:
X = torch.rand(1, 2, 14, 28)
model(X)

tensor([[-0.0126, -0.0991, -0.0260,  0.0468,  0.1808,  0.0109,  0.0938]],
       grad_fn=<AddmmBackward0>)

E2: We want the output of our NN to be the probability distribution for `out_dim = 7` different categories.  
Modify the network such that the output resembles a proper probability distribution, i.e. that 
$$\sum_{i=1}^\text{out_dim} p_i = 1$$ 
and $1\geq p_i \geq 0$

In [7]:
torch.isclose(torch.sum(model(X)), torch.tensor(1.))

tensor(False)

In [8]:
torch.all(model(X)>0)

tensor(False)

(Note that you can use most of the standard `numpy` functions in torch)  
Hint: Look up the so-called softmax function $\text{Softmax}: \mathbb{R}^m \mapsto \mathbb{R}^m$
$$
\left(\text{Softmax}(x)\right)_i = \frac{e^{x_i}}{\sum_{j=1}^m e^{x_j}}
$$
for $x \in \mathbb{R}^m$