# Quality metrics for classification

We will consider a range of classification quality metrics solving a classification problem with a neural network.

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import numpy.testing as np_testing
import matplotlib.pyplot as plt

# Load MAGIC Data Set

<center><img src="img/magic1.jpg" width="1000"></center>

Source: https://magic.mpp.mpg.de/

In [None]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/magic/magic04.data

Features description:
- **Length:** continuous # major axis of ellipse [mm]
- **Width:** continuous # minor axis of ellipse [mm]
- **Size:** continuous # 10-log of sum of content of all pixels [in #phot]
- **Conc:** continuous # ratio of sum of two highest pixels over fSize [ratio]
- **Conc1:** continuous # ratio of highest pixel over fSize [ratio]
- **Asym:** continuous # distance from highest pixel to center, projected onto major axis [mm]
- **M3Long:** continuous # 3rd root of third moment along major axis [mm]
- **M3Trans:** continuous # 3rd root of third moment along minor axis [mm]
- **Alpha:** continuous # angle of major axis with vector to origin [deg]
- **Dist:** continuous # distance from origin to center of ellipse [mm]
- **Label:** g,h # gamma (signal), hadron (background)

g = gamma (signal): 12332 \
h = hadron (background): 6688

In [None]:
f_names = np.array(["Length", "Width", "Size", "Conc", "Conc1", "Asym", "M3Long", "M3Trans", "Alpha", "Dist"])

data = pd.read_csv("magic04.data", header=None, names=list(f_names)+["Label"])
data.head()

# Data preparation

In [None]:
# prepare a matrix of input features
X = data[f_names].values

# prepare a vector of true labels
y = 1 * (data['Label'].values == "g")

In [None]:
# print sizes of X and y
X.shape, y.shape

In [None]:
X[:2]

In [None]:
y[:5]

# Train / test split

In [None]:
from sklearn.model_selection import train_test_split

# Split data into train and test samples
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42, stratify=y)

# Preprocessing

Scale input data using StandardScaler:
$$
X_{new} = \frac{X - \mu}{\sigma}
$$

In [None]:
# Import StandardScaler
from sklearn.preprocessing import StandardScaler

# Create object of the class and set up its parameters
ss = StandardScaler()

# Estimate mean and sigma values
ss.fit(X_train)

# Scale train and test samples
X_train = ss.transform(X_train)
X_test = ss.transform(X_test)

# Fit a classifier


Now let's create a neural network and fit it.

In [None]:
#!pip install pytorch-lightning 

In [None]:
import torch
from torch.nn import functional as F
from torch import nn
import pytorch_lightning as pl

class Model(pl.LightningModule):

    def __init__(self):
        super().__init__()
        
        # define all layers of the netwrok
        self.net = nn.Sequential(
                                nn.Linear(10, 10), 
                                nn.Tanh(), 
                                nn.Linear(10, 1), 
                                nn.Sigmoid())

    
    def forward(self, x):
        # make a prediction for x
        return self.net(x)

    # calculate loss function values
    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.binary_cross_entropy(y_hat, y)
        return loss

    # define optimizer to fit the network
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

In [None]:
from torch.utils.data import TensorDataset, DataLoader

# combine X and y into one pytorch tensor dataset
dataset_train = TensorDataset(torch.tensor(X_train, dtype=torch.float), 
                              torch.tensor(y_train.reshape(-1, 1), dtype=torch.float))

# loader divides our train data into batches
train_loader = DataLoader(dataset_train, batch_size=128)

In [None]:
# define trainer to fit out network
trainer = pl.Trainer(max_epochs=10)

# init our netwrok
model = Model()

# fit the netwrok
trainer.fit(model, train_loader)

# Make Predictions

Make prediction of **probability** of the positive class.

In [None]:
# make predictions
y_proba_test = model(torch.tensor(X_test, dtype=torch.float))[:, 0].detach().numpy()

In [None]:
print("Truth  : ", y_test[:10])
print("Proba  : ", y_proba_test[:10])

Make prediction of class **label**.

In [None]:
# transform the predicted probabilities into predicted labels {0, 1}
y_pred_test = 1 * (y_proba_test > 0.5)

In [None]:
print("Truth  : ", y_test[:10])
print("Pred   : ", y_pred_test[:10])

# Label-based Quality Metrics

Consider a confusion matrix:

<center><img src='img/cm.png'></center>


* TP (true positive) - currectly predicted positives
* FP (false positive) - incorrectly predicted negatives (1st order error)
* FN (false negative) - incorrectly predicted positives (2nd order error)
* TN (true negative) - currectly predicted negatives
* Pos (Neg) - total number of positives (negatives)

Quality metrics:

* $ \text{Accuracy} = \frac{TP + TN}{Pos+Neg}$
* $ \text{Error rate} = 1 -\text{accuracy}$
* $ \text{Precision} =\frac{TP}{TP + FP}$ 
* $ \text{Recall} =\frac{TP}{TP + FN} = \frac{TP}{Pos}$
* $ \text{F}_\beta \text{-score} = (1 + \beta^2) \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{(\beta^2 \cdot \mathrm{precision}) + \mathrm{recall}}$

### Task 1
Complete a function that computes TP, FP, TN, FN, Accuracy, Error rate, Precision, Recall and F1-score metrics for a classifier.

**Hint:** use implementation of the metrics from `sklearn.metrics` as it is shown below. Example for confusin matrix: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

def quality_metrics_report(y_true, y_pred):
    """
    Parameters
    ----------
    y_true: array-like of shape (n_samples,)
        Ground truth (correct) target values.
    y_pred: array-like of shape (n_samples,)
        Estimated targets as returned by a classifier.
        
    Returns
    -------
    List of metric values: [tp, fp, fn, tn, accuracy, error_rate, precision, recall, f1]
    """
    
    ### BEGIN SOLUTION
    
    ### END SOLUTION
    
    return [tp, fp, fn, tn, accuracy, error_rate, precision, recall, f1]

In [None]:
quality_metrics_report([0, 1, 0, 1], [1, 1, 1, 0])

Expected output:

<center>   
    
```python
[1, 2, 1, 0, 0.25, 0.75, 0.3333333333333333, 0.5, 0.4]
    
``` 
    
</center>

In [None]:
### BEGIN HIDDEN TESTS
actual  = quality_metrics_report([0, 1, 0, 1], [1, 1, 1, 0])
desired = [1, 2, 1, 0, 0.25, 0.75, 0.3333333333333333, 0.5, 0.4]
np_testing.assert_almost_equal(actual, desired, decimal=1)
### END HIDDEN TESTS

Now let's compute all these quality metrics for all classifiers considered above.

In [None]:
metrics_report = pd.DataFrame(columns=['TP', 'FP', 'FN', 'TN', 'Accuracy', 'Error rate', 'Precision', 'Recall', 'F1'])

metrics_report.loc['Model', :] = quality_metrics_report(y_test, y_pred_test)

metrics_report

## Probability-based Quality Metrics

### ROC curve

The receiver operating characteristic curve (ROC) measures how well a classifier separates two classes. 

Let $y_{\rm i}$ is a true label and $\hat{y}_{\rm i}$ is a predicted score for the $i^{\rm th}$ observation. 

The numbers of positive and negative observations: $\mathcal{I}_{\rm 1} = \{i: y_{\rm i}=1\}$ and $\mathcal{I}_{\rm 0} = \{i: y_{\rm i}=0\}$. 

The sum of observation weights for each class: $W_{\rm 1} = \sum_{i \in \mathcal{I}_{\rm 1}} w_{\rm i}$ and  $W_{\rm 0} = \sum_{i \in \mathcal{I}_{\rm 0}} w_{\rm i}$. 

For each predicted score threshold value $\tau$, True Positive Rate (TPR) and False Positive Rate (FPR) are calculated:

\begin{equation}
TPR(\tau) = \frac{1}{W_{\rm 1}} \sum_{i \in \mathcal{I}_{\rm 1}} I[\hat{y}_{\rm i} \ge \tau] w_{\rm i}
\end{equation}

\begin{equation}
FPR(\tau) = \frac{1}{W_{\rm 0}} \sum_{i \in \mathcal{I}_{\rm 0}} I[\hat{y}_{\rm i} \ge \tau] w_{\rm i}
\end{equation}

### Task 2
Complete the fucntion below, that computes a ROC curve and ROC AUC for a classifier.

**Hint:** use `roc_curve` and `auc` from `from sklearn.metrics`.

In [None]:
from sklearn.metrics import roc_curve, auc

def roc_curve_report(y_true, y_proba):
    """
    Parameters
    ----------
    y_true: array-like of shape (n_samples,)
        Ground truth (correct) target values.
    y_proba: array-like of shape (n_samples,)
        Predicted probabilities of the positive class predicted by a classifier.
        
    Returns
    -------
    fpr : array, shape = [>2]
        Increasing false positive rates such that element i is the false
        positive rate of predictions with score >= thresholds[i].
    tpr : array, shape = [>2]
        Increasing true positive rates such that element i is the true
        positive rate of predictions with score >= thresholds[i].
    roc_auc : float
        Area under the ROC curve defined by the fpr and tpr.
    """
    
    ### BEGIN SOLUTION

    ### END SOLUTION
    
    return fpr, tpr, roc_auc

In [None]:
roc_curve_report([0, 1, 0, 1], [0.6, 0.9, 0.1, 0.4])

Expected output:

<center>   
    
```python
(array([0. , 0. , 0.5, 0.5, 1. ]), array([0. , 0.5, 0.5, 1. , 1. ]), 0.75)
    
``` 
    
</center>

In [None]:
### BEGIN HIDDEN TESTS
actual  = roc_curve_report([0, 1, 0, 1], [0.6, 0.9, 0.1, 0.4])[0]
desired = np.array([0. , 0. , 0.5, 0.5, 1. ])
np_testing.assert_almost_equal(actual, desired, decimal=1)
### END HIDDEN TESTS

Now let's plot ROC curves for all classifiers considered above.

In [None]:
from sklearn.metrics import roc_curve, auc

fpr, tpr, roc_auc = roc_curve_report(y_test, y_proba_test)

In [None]:
plt.figure(figsize=(9, 6))
plt.plot(fpr, tpr, linewidth=3, label='Model')

plt.xlabel('FPR', size=18)
plt.ylabel('TPR', size=18)

plt.legend(loc='best', fontsize=14)
plt.grid(b=1)
plt.show()

print('ROC AUC:', roc_auc)