# An overview

---
### In this lesson you'll learn:

- to understand the relationship between a (logistic) regression and neural networks.
---

In this notebook we will have a last look at different model architectures and how they are related.

For this we will work again with the MNIST data set. First we load the data and normalize it. We also one-hot encode the target variable.

In [None]:
import numpy as np
import scipy
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression, LogisticRegression
import torch
from torch import nn
from torch import optim
def min_max(x):
    return (x - np.min(x)) / (np.max(x) - np.min(x))
def one_hot(x):
    """The labels of the images still need to be encoded into vectors of length 10"""
    dod = len(set(x)) # Checks how many different digits there are in the data set
    target = np.zeros([x.shape[0], dod]) # A matrix of zeros is created
    for i in range(x.shape[0]): # The for-loop puts a 1 in the matrix depending on which label the image has
        target[i, x[i]] = 1

    return target

In [None]:
train_data = np.genfromtxt('https://uni-muenster.sciebo.de/s/xSU1IKM6ui4WKAV/download', delimiter=',', skip_header =False)
test_data = np.genfromtxt('https://uni-muenster.sciebo.de/s/fByBt5wd24chROg/download', delimiter=',', skip_header =False) 
train_labels=train_data[:,0].astype(int) 
train_images = train_data[:,1:]

test_labels=test_data[:,0].astype(int)
test_images = test_data[:,1:]

In [None]:
train_targets=one_hot(train_labels)
test_targets = one_hot(test_labels)

In [None]:
train_images = min_max(train_images)
test_images = min_max(test_images)

In [None]:
plt.imshow(train_images[0].reshape([28, 28]), cmap="gray")
print("Correct Label: %s" % train_labels[0])

# Linear regression

We start with a simple regression. A linear regression can also be represented as a neural network.

<img src="Img/summary/lin_reg.png" width ="450px">

The output is composed of the weighted sum of the pixel values. That is, each pixel is assigned a weight. *The neuron may also still have a bias associated with it, but this is not shown*. 

Since we only have one output neuron, we can only make a single prediction. This means that we can only do a binary classification. For example: Is there a five shown in the picture? YES or NO.

We can also perform this linear regression in Python.
To do this, we use the `train_images` as input and the column of `train_targets` which corresponds to the label `5`. In this case this is the fifth column `train_targets[:,5]`.

In [None]:
linear_reg_model = LinearRegression()
linear_reg_model.fit(train_images, train_targets[:,5])

We can output the weights with `linear_reg_model.coef_`. There are 784 weights in total, one for each pixel. 

In [None]:
linear_reg_model.coef_[:5], linear_reg_model.coef_.shape

To see how well our model performs, we can use the `.predict()` function to predict the value for our test data set. Remember, we only want to predict zeros or ones.

`1` = "Five"

`0` = "Not a five"

In [None]:
pred_y = linear_reg_model.predict(test_images)
pred_y

These values are neither `0` nor `1`. We have to round them first.

In [None]:
pred_y = np.round(pred_y)
pred_y

Now we can calculate the accuracy:

In [None]:
np.mean(pred_y== test_targets[:,5])

`0.9456` is not so bad. But keep in mind that only about 10% of the images show a `5`. This also means that 90% of the images do not show a `5`. For these 90%, our model would have to predict a `0` to be correct. If the model simply predicts a `0` for all images, it would have an Accuarcy of `0.90`. So our accuracy may perform worse than originally thought.


We have another problem. Take a look at the predictions for `pred_y[1677]` or `pred_y[1162]`.

In [None]:
pred_y[1677],pred_y[1162]

These values are neither `1` nor `0`. How could this happen?
In a linear regression we do not use activation functions. Therefore, the output of a linear regression can take infinitely large or small values. If the values are outside the range `[-1.5, 1.5]`, they are not rounded to `0` or `1`.

This is not a problem at first, we could assign `0` or `1` to these values manually. But the problem remains in principle: How do we prevent the model to predict values that are out of the possible range. 

A `sigmoid` function does the trick. It transforms all values so that they always lie between `0` and `1`. So we can simply "attach" a `sigmoid` function to the linear regression. This would solve the problem. And that is exactly what happens in logistic regression.

<img src="Img/summary/log_reg.png" width="450px">

We can also calculate this in Python.

In [None]:
log_reg_model = LogisticRegression(solver = 'lbfgs', max_iter=1000,  random_state=134)
log_reg_model.fit(train_images, train_targets[:,5])
log_reg_model.coef_[0,256:261], log_reg_model.coef_.shape

We obtain again `784` weights. One for each pixel. We also see that our predictions for the testset set are now already rounded.

In [None]:
pred_y = log_reg_model.predict(test_images)
pred_y

Again, we calculate the accuracy:

In [None]:
np.mean(pred_y == test_targets[:,5])

With the help of logistic regression, we could increase the accuracy. So far, however, we only distinguish between "Five" and "Not a five". But actually we want to be able to recognize every digit. This is also possible with a logistic regression.

For this to work we need multiple output nodes. Ten in total, one for each digit. 

<img src="Img/summary/log_reg_2.png" width="540px">

We now use the `softmax` function. Unlike the `sigmoid` function, the `softmax` function ensures that the sum of activations over the 10 outputs is always exactly `1`. If we were to use the `sigmoid` function, it could happen that an image is detected as a five and a one. 


For this logistic regression we now need to add the complete `train_labels` matrix. 

In [None]:
log_reg_model_complete = LogisticRegression(solver = 'lbfgs', max_iter=1000,  random_state=134)
log_reg_model_complete.fit(train_images, train_labels)
log_reg_model_complete.coef_[0,256:261], log_reg_model_complete.coef_.shape

The weight matrix `log_reg_model_all.coef_` has now the size `[10,784]`. So each output neuron has 784 weights

We also obtain predictions, which contain the digit recognized by the model. 

In [None]:
pred_y = log_reg_model_complete.predict(test_images)
pred_y

We calculate again the accuracy:

In [None]:
np.mean(pred_y==test_labels)

The model can correctly recognize 92.5% of the digits. This is of course worse than before, but this time the task is much more complex, because it is not only about one digit, but all digits need to be recognized. 
With a simple logistic regression we can achieve a relatively good accuracy. 

So why do we need neural networks? These can also give us the last percentage points of performance. The difference between our current model and a neural network is the lack of hidden layers. 

<img src="Img/summary/nn1.png" width="450px">

We can add hidden layers using PyTorch:

In [None]:

train_images =torch.tensor(train_images, dtype = torch.float32)
test_images =torch.tensor(test_images, dtype = torch.float32)

train_labels =torch.tensor(train_labels, dtype = torch.long)
test_labels =torch.tensor(test_labels, dtype = torch.long)

In [None]:
simple_nn = nn.Sequential(nn.Linear(784,10),nn.ReLU() ,nn.Linear(10,10))
loss_funktion = nn.CrossEntropyLoss()
updater = optim.Adam(simple_nn.parameters(), lr = 0.01)

In [None]:
torch.manual_seed(1234)
for epoch in range(135):
    updater.zero_grad()
    output = simple_nn(train_images)
    loss = loss_funktion(output, train_labels)
    loss.backward()
    updater.step()
    
    

The code should be familiar to you by now. However, if we look at the accuracy, we see that the neural network performance comparable to that of the logistic regression.

In [None]:
pred_y=torch.argmax(simple_nn(test_images),1).detach().numpy()

In [None]:
np.mean(pred_y==test_labels.numpy())

This can have several reasons. In principle, neural networks do not have to work better than simpler models.
In this case, however, it is probably due to the network itself. We could use more or larger layers or we change the optimizer or the learning rate to imrpove the performance of the network.



# Exercise


Today's notebook is shorter than usual, so you have more time for the excercise. 
Today's exercise is about applying what you have learned to the MNIST dataset again. 

You will be given three data sets (randomly shuffled):

- Training data: use it to train the model
- Test data: use it to evaluate the trained network
- External test dataset: images only, no labels → you send me the predictions for this dataset.

The external dataset has no labels (at least none that you can see). 
In the exercise task, you will also hand in **your** predictions for the external dataset.

We will then compare your predictions to the true values. 
*Which of you creates the best model?*

An initial model has been given.
From there, you can improve the network.

There are several ways to improve it:
Here are a few examples.

- Adjust hyperparameters, e.g. number of epochs, batch size, learning rate or number of hidden layers.
- Batchnorm and dropout
- use of a CNN
- Optimizers

Be careful not to overfit to the test dataset. This can also happen.

At the end of the notebook is a cell that you can use to create and save the prediction for the test data set. 
This will be saved in the `data` folder as `my_prediction.csv`.

Please submit both your prediction and the notebook.


# Data 

First load all the data.

In [None]:
import numpy as np
import scipy
from matplotlib import pyplot as plt
import torch
from torch import nn
from torch import optim
from torch.utils import data
import pandas as pd

def min_max(x):
    return (x - 0.) / (255. - 0.)


In [None]:
train_data = np.genfromtxt('https://uni-muenster.sciebo.de/s/xSU1IKM6ui4WKAV/download', delimiter=',', skip_header =False)
train_labels=train_data[:,0].astype(int) 
train_images = min_max(train_data[:,1:])
del train_data 

test_data = np.genfromtxt('https://uni-muenster.sciebo.de/s/fByBt5wd24chROg/download', delimiter=',', skip_header =False)
test_labels=test_data[:,0].astype(int)
test_images = min_max(test_data[:,1:])
del test_data 

external_images=min_max(np.genfromtxt('https://uni-muenster.sciebo.de/s/X0yKbGdk3XaU8fy/download', delimiter=',', skip_header =False))

In [None]:
train_images = torch.tensor(train_images, dtype = torch.float32)
test_images = torch.tensor(test_images, dtype = torch.float32)

train_labels = torch.tensor(train_labels, dtype = torch.long)
test_labels = torch.tensor(test_labels, dtype = torch.long)

external_images = torch.tensor(external_images, dtype = torch.float32)

In [None]:
train_data = data.TensorDataset(train_images, train_labels) 
loader = data.DataLoader(train_data, batch_size = 32)

##  Model

In [None]:
simple_nn = nn.Sequential(nn.Linear(784,10),nn.ReLU() ,nn.Linear(10,10))
loss_funktion = nn.CrossEntropyLoss()
updater = optim.Adam(simple_nn.parameters(), lr = 0.0001)

In [None]:
torch.manual_seed(1234)
for epoch in range(20):
    simple_nn.train()
    for images, labels in loader:
        updater.zero_grad()
        output = simple_nn(images)
        loss = loss_funktion(output, labels)
        loss.backward()
        updater.step()
    
    simple_nn.eval()
    # EVALUATE #
    # Train
    output = simple_nn(train_images)
    loss = loss_funktion(output, train_labels)
    prediction = torch.argmax(output,1).detach().numpy()
    acc  = np.mean(prediction == train_labels.detach().numpy()  )
    # Tets
    output = simple_nn(test_images)
    test_loss = loss_funktion(output, test_labels)
    prediction = torch.argmax(output,1).detach().numpy()
    test_acc  = np.mean(prediction == test_labels.detach().numpy()  )
    print(f"Epoch {epoch} | Trainings Loss: {loss:.3f} Training Acc: {acc:.3f} | Test Loss: {test_loss:.3f} Test Acc:  {test_acc:.3f}")
        

# Externe Daten 

In [None]:
simple_nn.eval()
externe_pred = torch.argmax(simple_nn(external_images),1).detach().numpy()

The next cell generates a `.csv` file with your predictions. Please submit this with the notebook.

In [None]:
pd.DataFrame(externe_pred.reshape(10000,1)).to_csv("../data/my_prediction.csv", index =False,header =False)