# Performance Assessment

<img src="./img/5_performance_assessment.png" width="500px"><br><br>

If we want to know how good an estimator performs, we can calculate some performance metrics.

Here we will dig a little deeper into

- Accuracy
- Confusion matrices

and meet new metrices

- Precision
- Recall and
- F1 score



### Performance Assessment for kNN based Classification

Starting point: We create a classification model using kNN (example from last lecture).

In [None]:
# Load required libraries (basics)
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.inspection import DecisionBoundaryDisplay

In [None]:
# we set k to 7
n_neighbors = 7

# import some data to play with
iris = load_iris()

# we only take the first two features. We could avoid this ugly
# slicing by using a two-dim dataset
X = iris.data[:, :2]
y = iris.target

# split data (67% training, 33% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [None]:
# we create an instance of kNN Classifier and fit the data.
knn_clf = KNeighborsClassifier(n_neighbors, weights='distance')
knn_clf.fit(X_train, y_train)

#### Confusion matrix
A Confusion matrix is a figure or a table that is used to describe the performance of a classifier. It is usually extracted from a test dataset for which the ground truth is known. We compare each class with every other class and see how many samples are misclassified. During the construction of this table, we actually come across several key metrics that are very important in the field of machine learning.

Let's consider a binary classification case where the output is either 0 or 1:
* **True positives:** These are the samples for which we predicted 1 as the output and the ground truth is 1 too.
* **True negatives:** These are the samples for which we predicted 0 as the output and the ground truth is 0 too.
* **False positives:** These are the samples for which we predicted 1 as the output but the ground truth is 0. This is also known as a Type I error.
* **False negatives:** These are the samples for which we predicted 0 as the output but the ground truth is 1. This is also known as a Type II error.

Depending on the problem at hand, we may have to optimize our algorithm to reduce the false positive or the false negative rate. For example, in a biometric identification system, it is very important to avoid false positives, because the wrong people might get access to sensitive information. Let's see how to create a confusion matrix.

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

# Define sample labels
print(f"True labels:\n{y_test}")
y_pred = knn_clf.predict(X_test)
print(f"Predicted labels:\n{y_pred}")

In [None]:
# Create confusion matrix
cm = confusion_matrix(y_test, y_pred)

# visualize the confusion matrix
ax = plt.axes()
sns.heatmap(cm, annot=True, annot_kws={"size": 30}, cmap="Greens", ax=ax)
ax.set_title('Confusion Matrix')
plt.show()

print('Accuracy:')
print(f' Train: {accuracy_score(y_train, knn_clf.predict(X_train))*100:.2f} %')
print(f' Test:  {accuracy_score(y_test, y_pred)*100:.2f} %')


#### Quality measures and cross-validation for a classification model

Measures to evaluate the quality of a machine learning technique (precision, recall, accuracy and F1 are the most important ones):<br><br>

<img src="./img/5_performance_metric.png" width="1000px">

In [None]:
# Classification report (as created by sklearn)
print('\n', classification_report(y_test, y_pred, target_names=iris.target_names))

### Cross validation

<img src="./img/5_cross_validation.png" width="700px"><br><br>

`Cross validation` splits the __training data__ into segments (___k-folds___) and trains the model iteratively.

Each iteration is then tested against the test set. The errors are leveled out.

This allows for higher model quality and generalization capabilities.

Calculating quality measures through cross-validation:

In [None]:
from sklearn import model_selection

# this defines the cross-validation strategy (cv parameter)
num_folds = 3
# 3 means: 3 folds (sets) for cross-validation

# Scoring functions of sklearn (original data: X and y!)
accuracy_values = model_selection.cross_val_score(knn_clf, X, y, scoring='accuracy', cv=num_folds)
print(f"Accuracy:  {accuracy_values.mean():.2f}% -> {accuracy_values}")

precision_values = model_selection.cross_val_score(knn_clf, X, y, scoring='precision_weighted', cv=num_folds)
print(f"Precision: {precision_values.mean():.2f}% -> {precision_values}")

recall_values = model_selection.cross_val_score(knn_clf, X, y, scoring='recall_weighted', cv=num_folds)
print(f"Recall:    {recall_values.mean():.2f}% -> {recall_values}")

f1_values = model_selection.cross_val_score(knn_clf, X, y, scoring='f1_weighted', cv=num_folds)
print(f"F1:        {f1_values.mean():.2f}  -> {f1_values}")

In [None]:
# Draw decision boundaries
DecisionBoundaryDisplay.from_estimator(knn_clf, X, alpha=0.4, response_method="predict")
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, edgecolor="k")
plt.title("Decision boundary for kNN Classifier")
plt.show()

Compared to our single fold `kNN` classifier, the `cross validation` offers better overall performance:<br><br>

<table>
    <tr>
        <td style="border: none">&nbsp;</td>
        <td style="text-align:center">single fold kNN</td>
        <td style="text-align:center">cross validation kNN</td>
    <tr>
    <tr>
        <td>Accuracy</td>
        <td style="text-align:center">0.76</td>
        <td style="text-align:center">0.79</td>
    <tr>
    <tr>
        <td>Precision</td>
        <td style="text-align:center">0.74</td>
        <td style="text-align:center">0.81</td>
    <tr>
    <tr>
        <td>Recall</td>
        <td style="text-align:center">0.74</td>
        <td style="text-align:center">0.79</td>
    <tr>
    <tr>
        <td>F1</td>
        <td style="text-align:center">0.74</td>
        <td style="text-align:center">0.78</td>
    <tr>
</table>

<span style="font-size:70%">Cross validation provides higher overall performance measures.</span>