## Model Evaluation
In this task, we will take a look at the evaluation of a classifier. As a starting point, we give you some functions that allow you to train a classifier with [PyTorch](https://pytorch.org). PyTorch is a bit more advanced than scikit-learn and needs you to do more of the busy work yourself.
On the other hand, it gives you the freedom to manually create your own training schemes and network configurations. Together with [tensorflow](https://www.tensorflow.org), it is the de facto industry standard when it comes to neural network training.
For this task, it's not really necessary to understand the PyTorch code. But if you're interested in learning PyTorch, try to follow along by reading the comments. Don't worry if you don't understand everything. Just be aware that for our purpose `torch.tensor` behaves mostly like `numpy.array` which you should be familiar with by now.


A great opportunity to learn more is the [tutorial section of the PyTorch homepage](https://pytorch.org/tutorials/) which provides many tutorials on different machine learning tasks. If you want to find information on a given function, take a look at the [documentation section](https://pytorch.org/docs/stable/index.html).

In [61]:
# install the required packages
!python -m pip install torch
!python -m pip install sklearn



In [62]:
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
torch.manual_seed(0)

<torch._C.Generator at 0x7f1da40f2390>

### Load the Data
We load the boston dataset from scikit learn and normalize it with z-score transformation. Afterwards we shuffle the data because it is ordered by class and this order would mess with the kfold crossvalidation you are going to implement.

In [76]:
wine = load_wine()
data = wine["data"]

target = torch.from_numpy(wine["target"])

# scale the data to mean = 0 and var = 1
scaler = StandardScaler()
scaler.fit(data)
data = torch.from_numpy(scaler.transform(data)).float()

# because the data is ordered, we need to shuffle it
shuffle_seed = torch.randperm(data.shape[0])
data = data[shuffle_seed]
target = target[shuffle_seed]

attribute_count = data.shape[1]
label_count = len(wine["target_names"])

As you should already know you can print the information about the dataset with the `DESCR` key:

In [122]:
print(wine["DESCR"])

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

### Define and Initialize Model
Here we define our model. Some of the values are fixed by our dataset, like the number of input neurons and the number of output neurons. 
The hidden layers can be varied and are given in here as a list of integers, where every element defines the number of neurons in a hidden layer, i.e.<br>`hidden_layers=[10,10]` defines a neural network with two hidden layers with 10 neurons each.

In [78]:
def create_model(hidden_layers = [],input_size = attribute_count, output_size = label_count, 
                 activation = torch.nn.ReLU(),output_activation = torch.nn.Identity()):
    # the list of sizes is usefull to manage the input and output sizes of the layers in our network
    sizes = [input_size] + hidden_layers + [output_size]
    
    # the list of layers will be combined by using nn.Sequential to easily create a feed forwad network
    # from a list of layers and activation functions
    layers = []
    
    for i in range(len(sizes)-1):
        # choose the inner activation function for all layers except the last one
        act = activation if i < len(sizes) -2 else output_activation
        # concatenate a Linear layer and the activation function with our layer list
        layers+= [torch.nn.Linear(sizes[i],sizes[i+1]),act]
    
    # create the neural network from our layer list
    return torch.nn.Sequential(*layers)

### Training Loop
The `train_model` function contains the training loop for a given model. Mandatory inputs are `model`, `data`, `target`, and `epochs`. 

In [79]:
def trainModel(model, data, target, epochs, lr = 0.01, batchsize = 20, shuffle = False):
    # define the loss function (here, we use cross-entropy) 
    criterion = torch.nn.CrossEntropyLoss()
    
    # the optimization method for the weights (Adam or Stochastic Gradient Descent (SGD) are common practice)
    optimizer = torch.optim.Adam(model.parameters(),lr=lr)
    
    # loop n times over the dataset
    for epoch in range(epochs):
        
        # it may be helpful to shuffle your data every epoch, we don't do it here for reproducibility reasons
        if shuffle:
            seed = torch.randperm(data.shape[0])
            data = data[seed]
            target = target[seed]
        for index in range(0,len(data),batchsize):
            
            # create the batch
            batch_last = index + batchsize
            data_batch = data[index: batch_last] if batch_last < data.shape[0] else data[index: -1]
            target_batch = target[index: batch_last] if batch_last < target.shape[0] else target[index: -1]
            
            # forwad pass
            
            # calculate the outputs
            scores = model(data_batch)
            # calculate the loss
            loss = criterion(scores, target_batch)
            
            # backpropagation
            
            # the gradient has to be set to zero before calculating the new gradients
            optimizer.zero_grad()
            # propagate the loss backwards through the network
            loss.backward()
            # update the weights
            optimizer.step()
    
    # return the trained model       
    return model
    

### Make predictions
The `predict` function takes the model and some data and predicts the class associated with the data.

In [80]:
def predict(data, model):
    # if a single datapoint is given, we have to unsqueeze it to handle more than one datapoint as well
    if len(data.shape) == 1:
        data = data.unsqueeze(0)
    
    # find the output of our model that has the largest value and use it as our prediction
    _, prediction = model(data).max(1)
    
    return prediction

### Accuracy
The `calculate_accuracy` function takes some data and the asscociated targets and a model and calculates the accuracy of the model on the data.

In [81]:
def calculate_accuracy(data, target, model):
    num_samples = data.shape[0]
    
    # switch to evaluation mode
    model.eval()
    
    with torch.no_grad():
        # generate the predictions for the data from our model
        prediction = predict(data,model)
        # count correct predictions
        num_correct = (prediction == target).sum()
        # calculate accuracy (proportion of correct predictions)
        return num_correct/num_samples


### Putting it all together
Now it is time to put it all together. We create a model with two hidden layers with 100 neurons each and train it on the whole dataset. After that, we evaluate the accuracy of our model on the training data.

In [110]:
model = create_model([10])
model = trainModel(model, data, target, 50, lr = 0.01)
accuracy = calculate_accuracy(data,target, model)

print(f"Accuracy on training set: {accuracy*100:.2f}")

Accuracy on training set: 100.00


### Task 3.1 Cross-Validation
100% accuracy looks really good, but maybe it is too good to be true. Until now, we had trained on the same set of data that we used for evaluation. This is a bad practice, especially for small datasets like ours, because our network may be overfitting.

Now it's your turn: Write a function that performs k-fold cross-validation on the dataset to test the quality of your model. To do so, split the data into k training and test subsets. Train multiple models on the training data and evaluate the accuracy on the test data.

Return the different results as well as the average accuracy.

In [127]:
def kfold_crossvalidation(k, data, target, hidden = [100], epochs  = 50, lr = 0.01):
    
    accuracies = []
    
    for i in range(k):
        model = create_model(hidden)
        
        #sd = subdata (sorry)
        sd = []
        sdtarget = []
        test = []
        testtarget = []
        for j in range(data.shape[0]):
            if j % k == i:
                test.append(data[j])
                testtarget.append(target[j])
            else:
                sd.append(data[j])
                sdtarget.append(target[j])
        
        sd = torch.stack(sd)
        test = torch.stack(test)    
        sdtarget = torch.stack(sdtarget)
        testtarget = torch.stack(testtarget)
        
        submodel = trainModel(model, sd, sdtarget, epochs, lr)
        accuracies.append(calculate_accuracy(test, testtarget, submodel).item()*100)
                          
    avg_accuracy = sum(accuracies)/len(accuracies)
                          
    return accuracies, avg_accuracy

print(kfold_crossvalidation(8, data, target))

([100.0, 95.652174949646, 100.0, 95.45454382896423, 100.0, 100.0, 95.45454382896423, 100.0], 98.32015782594681)


### Test k-fold Cross-Validation
The following code can be used to test your implementation. If your average accuracy is at ~98%, you probably have done it correctly.

In [None]:
torch.manual_seed(0)
kfold_crossvalidation(10, data, target, [10], 10, 0.01)

### Task 3.2 Calculate the Confusion Matrix
Since our model is not as perfect as it seems, let's find out what kind of misclassifications it produced. Write a function that calculates the confusion matrix for our data. To do so, create a m x m matrix with m as the number of classes. Predict the classes and compare the prediction with the target. Sum up how often the inputs of each class were assigned to the different classes by our classifier.

In [166]:
def confusion_matrix(data,target,model): 
    model.eval()
    with torch.no_grad():
        result = predict(data,model)
    
    m = torch.unique(target)
    m = m.size(dim=0)
    confusion_matrix = [[0 for i in range(m)] for j in range(m)]
    
    for k in range(target.size(dim=0)):
        confusion_matrix[target[k].item()][result[k].item()] += 1
        
    return confusion_matrix

### Test the Confusion Matrix
The following code can be used to test your confusion matrix.

In [167]:
torch.manual_seed(0)

training_data = data[0:120]
training_target = target[0:120]

test_data = data[120:-1]
test_target = target[120:-1]

model = create_model([10])
model = trainModel(model, training_data, training_target, 10, lr = 0.01)

print(confusion_matrix(test_data, test_target, model))

[[19, 1, 0], [0, 25, 0], [0, 0, 12]]


### Task 3.3 Interpret the Confusion Matrix

Take a look at the confusion matrix you calculated. What kind of error(s) did our model produce?

In [None]:
# There is one error where an object that should be in 
# the first class (0) gets assigned to the second class (1)