# Tutorial 3

## 1. Backpropagation (without Python)

Take a look at the following simple neural network.

<center>
<img src="https://raw.githubusercontent.com/SvenKlaassen/DL-Lecture-Figures/main/figures/simple_nn.png" alt="Simple Network" style="width: 800px;"/><br>
<b>Figure 1:</b> Simple Neural Network.</center>

Here, $\sigma_1$ and $\sigma_2$ are ReLU. Further the weights are set to

$$ v_1=\begin{pmatrix} 1\\2 \end{pmatrix}, v_2=\begin{pmatrix} -3\\4 \end{pmatrix} \text{ and }w=\begin{pmatrix} -2\\1\\0.5 \end{pmatrix}.$$

Assume you observe the features $x=(1,1)^T$ with outcome $y=4$.

* Calculate one forward pass through the model to obtain the predicted value $\hat{y}$. What is the corresponding squared loss?

* Perform a backward pass through the model and calculate the gradient of the loss function with respect to all parameters of the model.

* (Optional) Validate your results using python.

## 2. Neural Network

### Dataset

Start by importing the necessary modules.

In [None]:
import torch
import torchvision
import torchvision.datasets as datasets

Next, we load the famous [MNIST](http://yann.lecun.com/exdb/mnist/) dataset.

In [None]:
mnist_train = datasets.MNIST(root='./data', train=True, download=True,   transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,))
                             ]))
mnist_test = datasets.MNIST(root='./data', train=False, download=True,   transform=torchvision.transforms.Compose([
                               torchvision.transforms.ToTensor(),
                               torchvision.transforms.Normalize(
                                 (0.1307,), (0.3081,))
                             ])) #normalize on the training dataset

The dataset contains $70,000$ observations of handwritten digits with corresponding labels. Here, the data is transformed to a tensor and normalized ($0.1307$ and $0.3081$ are the mean and the standard deviation on the training set).

In [None]:
len(mnist_train), len(mnist_test)

* Construct a `DataLoader` for the training set by using `torch.utils.data.DataLoader` directly (batch sizes of $64$ and $128$) and set a seed for the random number generator. Additionally, create an `DataLoader` for the test set.

In [None]:
train_loader = torch.utils.data.DataLoader(mnist_train,10,shuffle=True)

* Take a look at the data examples by iterating the `DataLoader` once over the training set. Save the batch (e.g. as `example`).

**Digression:** Another helpful built-in function of Python is `enumerate`. Try looping over the following list using `enumerate`.

In [None]:
list = ['Item 1', 'Item 2', 'Item 3']

In [None]:
for counter, item in enumerate(list):
    print(counter, item)

For our purposes, `enumerate` can be used on the `DataLoader`.

In [None]:
examples = enumerate(train_loader)
batch_idx, (example_features, example_labels) = next(examples)

The following code chunk displays the features as an imange and the corresponding labels.

In [None]:
import matplotlib.pyplot as plt

fig = plt.figure()
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.tight_layout()
    plt.imshow(example_features[i][0], cmap='gray', interpolation='none')
    plt.title("Label: {}".format(example_labels[i]))
    plt.xticks([0,14,28])
    plt.yticks([0,14,28])
plt.show()

## Implementing a Neural Network

Import the `torch.nn` and construct a sequential neural network (choose your own structure; maybe not too large) for classification. Starting with the `nn.Flatten` layer migth be very helpful.

* Use the `CrossEntropyLoss` (`reduction='sum'` might be helpful later) and `torch.optim.Adam`for optimization.
* Initialize all weights of your network from a normal distribution with standard deviation $0.01$.
* Before we start training our network, use our `train_loader` and `test_loader` to evaluate the loss on our training and testing data (this might take a while).
* Specify the number of epochs to $1$ and start training your network. Afterwards evaluate the loss.
* We would like to get more information printed during the training process. Increase the number of epochs to $5$ and every $50$ batches print the current epoch and loss on the batch (you can add the number of used observations in the epoch as well). Further add the end of each epoch evaluate the loss on the training set. Before starting reinitialize the weights randomly.
* Next, take a look at a specific predicition on your example batch from above and compare the predictions to the corresponding labels.
* Use this to evaluate the share of accuracy (share correctly predicted labels) on the test set at each epoch. Additionally, log all the printed losses. Again, before starting reinitialize the weights randomly.
* Plot the logged losses in a suitable plot.

Neural network modules as well as optimizers have the ability to save and load their internal state using `.state_dict()`.
`load_state_dict(state_dict)`. See [here](https://pytorch.org/tutorials/beginner/saving_loading_models.html) for saving and loading models. What does the dictionary save?

### GPUs
Now, use GPUs for training your model.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")