# MMU Confusion matrix & Metrics walkthrough

This notebook briefly demonstrates the various capabilities of the package on the computation of confusion matrix/matrices and binary classification metrics.

In [1]:
import pandas as pd
import numpy as np

import mmu

### Data generation

We generate predictions and true labels where:
* `scores`: classifier scores
* `yhat`: estimated labels
* `y`: true labels

In [2]:
scores, yhat, y = mmu.generate_data(n_samples=10000)

### Confusion matrix only

We can compute the confusion matrix for a single run using the estimated labels or based on the probability and a classification threshold.

Based on the esstimated labels `yhat`

In [3]:
# based on yhat
mmu.confusion_matrix(y, yhat)

array([[1347, 2380],
       [ 926, 5347]])

based on classifier score with classification threshold

In [4]:
mmu.confusion_matrix(y, scores=scores, threshold=0.5)

array([[1347, 2380],
       [ 926, 5347]])

## Precision-Recall

mmu has a specialised function for the positive precision and recall

In [5]:
cm, prec_rec = mmu.precision_recall(y, scores=scores, threshold=0.5, return_df=True)

Next to the point precision and recall there is also a function to compute the precision recall curve.

`precision_recall_curve` also available under alias `pr_curve` requires you to pass the discrimination/classification thresholds.

## Auto thresholds

mmu provides an utility function `auto_thresholds` that returns the all the thresholds that result in a different confusion matrix.

For large test sets this can be a bit much. `auto_thresholds` has an optional parameter `max_steps` that limits the number of thresholds.
This is done by weighted sampling where the scaled inverse proximity to the next score is used as a weight. In practice this means that the extremes of the scores are oversampled. `auto_thresholds` always ensures that the lowest and highest score are included.

In [6]:
thresholds = mmu.auto_thresholds(scores)

## Confusion matrix and metrics

The ``binary_metrics*`` functions compute ten classification metrics:
 *    0 - neg.precision aka Negative Predictive Value
 *    1 - pos.precision aka Positive Predictive Value
 *    2 - neg.recall aka True Negative Rate & Specificity
 *    3 - pos.recall aka True Positive Rate aka Sensitivity
 *    4 - neg.f1 score
 *    5 - pos.f1 score
 *    6 - False Positive Rate
 *    7 - False Negative Rate
 *    8 - Accuracy
 *    9 - MCC

These metrics were chosen as they are the most commonly used metrics and most other metrics can be compute from these. We don't provide individual functions at the moment as the overhead of computing all of them vs one or two is negligable.

This index can be retrieved using:

In [7]:
col_index = mmu.metrics.col_index
col_index

{'neg.precision': 0,
 'neg.prec': 0,
 'npv': 0,
 'pos.precision': 1,
 'pos.prec': 1,
 'ppv': 1,
 'neg.recall': 2,
 'neg.rec': 2,
 'tnr': 2,
 'specificity': 2,
 'pos.recall': 3,
 'pos.rec': 3,
 'tpr': 3,
 'sensitivity': 3,
 'neg.f1': 4,
 'neg.f1_score': 4,
 'pos.f1': 5,
 'pos.f1_score': 5,
 'fpr': 6,
 'fnr': 7,
 'accuracy': 8,
 'acc': 8,
 'mcc': 9}

### For a single test set

In [8]:
cm, metrics = mmu.binary_metrics(y, yhat)

In [9]:
# the confusion matrix
cm

array([[1347, 2380],
       [ 926, 5347]])

In [10]:
metrics

array([0.59260889, 0.69198913, 0.36141669, 0.85238323, 0.449     ,
       0.76385714, 0.63858331, 0.14761677, 0.6694    , 0.24667191])

We can also request dataframes back:

In [11]:
cm, metrics = mmu.binary_metrics(y, yhat, return_df=True)

In [12]:
metrics

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.592609,0.691989,0.361417,0.852383,0.449,0.763857,0.638583,0.147617,0.6694,0.246672


### A single run using probabilities

In [13]:
cm, metrics = mmu.binary_metrics(y, scores=scores, threshold=0.5, return_df=True)

In [14]:
cm

Unnamed: 0_level_0,Unnamed: 1_level_0,estimated,estimated
Unnamed: 0_level_1,Unnamed: 1_level_1,negative,positive
observed,negative,1347,2380
observed,positive,926,5347


In [15]:
metrics

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.592609,0.691989,0.361417,0.852383,0.449,0.763857,0.638583,0.147617,0.6694,0.246672


### A single run using multiple thresholds

Can be used when you want to compute a precision-recall curve for example

In [16]:
thresholds = mmu.auto_thresholds(scores)

In [17]:
cm, metrics = mmu.binary_metrics_thresholds(
    y=y,
    scores=scores,
    thresholds=thresholds,
    return_df=True
)

The confusion matrix is now an 2D array where the rows contain the confusion matrix for a single threshold

In [18]:
cm

Unnamed: 0,TN,FP,FN,TP
0,0,3727,0,6273
1,0,3727,1,6272
2,1,3726,1,6272
3,2,3725,1,6272
4,3,3724,1,6272
...,...,...,...,...
9995,3727,0,6268,5
9996,3727,0,6269,4
9997,3727,0,6270,3
9998,3727,0,6271,2


Similarly, `metrics` is now an 2D array where the rows contain the metrics for a single threshold

In [19]:
metrics

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.000000,0.627300,0.000000,1.000000,0.000000,0.770970,1.000000,0.000000,0.6273,0.000000
1,0.000000,0.627263,0.000000,0.999841,0.000000,0.770895,1.000000,0.000159,0.6272,-0.007708
2,0.500000,0.627325,0.000268,0.999841,0.000536,0.770942,0.999732,0.000159,0.6273,0.003724
3,0.666667,0.627388,0.000537,0.999841,0.001072,0.770990,0.999463,0.000159,0.6274,0.010532
4,0.750000,0.627451,0.000805,0.999841,0.001608,0.771037,0.999195,0.000159,0.6275,0.015609
...,...,...,...,...,...,...,...,...,...,...
9995,0.372886,1.000000,1.000000,0.000797,0.543215,0.001593,0.000000,0.999203,0.3732,0.017240
9996,0.372849,1.000000,1.000000,0.000638,0.543176,0.001274,0.000000,0.999362,0.3731,0.015419
9997,0.372812,1.000000,1.000000,0.000478,0.543136,0.000956,0.000000,0.999522,0.3730,0.013353
9998,0.372775,1.000000,1.000000,0.000319,0.543097,0.000637,0.000000,0.999681,0.3729,0.010902


Generate multiple runs for the below functions

In [20]:
scores, yhat, y = mmu.generate_data(n_samples=10000, n_sets=100)

### Multiple runs using a single threshold

You have performed bootstrap or multiple train-test runs and want to evaluate the distribution of the metrics you can use `binary_metrics_runs`.

`cm` and `metrics` are now two dimensional arrays where the rows are the confusion matrices/metrics for that a run

In [21]:
cm, metrics = mmu.binary_metrics_runs(
    y=y,
    scores=scores,
    threshold=0.5,
)

In [22]:
cm[:5, :]

array([[1432, 2400,  906, 5262],
       [1428, 2379,  914, 5279],
       [1387, 2368,  890, 5355],
       [1362, 2405,  876, 5357],
       [1352, 2370,  906, 5372]])

### Multiple runs using multiple thresholds

You have performed bootstrap or multiple train-test runs and, for example, want to evaluate the different precision recall curves

In [23]:
cm, metrics = mmu.binary_metrics_runs_thresholds(
    y=y,
    scores=scores,
    thresholds=thresholds,
    fill=1.0
)

The confusion matrix and metrics are now cubes.

For the confusion matrix the:
* row -- the thresholds
* colomns -- the confusion matrix elements
* slices -- the runs

For the metrics:

* row -- thresholds
* colomns -- the metrics
* slices -- the runs

The stride is such that the biggest stride is over the thresholds for the confusion matrix and over the metrics for the metrics.

The argument being that you will want to model the confusion matrices over the runs
and the metrics individually over the thresholds and runs

In [24]:
print('shape confusion matrix: ', cm.shape)
print('strides confusion matrix: ', cm.strides)

shape confusion matrix:  (10000, 4, 100)
strides confusion matrix:  (3200, 8, 32)


In [25]:
print('shape metrics: ', metrics.shape)
print('strides metrics: ', metrics.strides)

shape metrics:  (10000, 10, 100)
strides metrics:  (800, 8000000, 8)


In [26]:
pos_recalls = metrics[:, mmu.metrics.col_index['pos.rec'], :]
pos_precisions = metrics[:, mmu.metrics.col_index['pos.prec'], :]

### Binary metrics over confusion matrices

This can be used when you have a methodology where you model and generate confusion matrices

In [27]:
# We use confusion_matrices to create confusion matrices based on some output
cm = mmu.confusion_matrices(
    y=y,
    scores=scores,
    threshold=0.5,
)

In [28]:
metrics = mmu.binary_metrics_confusion_matrices(cm, 0.0)

In [29]:
mmu.metrics_to_dataframe(metrics)

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.612489,0.686766,0.373695,0.853113,0.464182,0.760954,0.626305,0.146887,0.6694,0.260525
1,0.609735,0.689344,0.375099,0.852414,0.464466,0.762255,0.624901,0.147586,0.6707,0.260853
2,0.609135,0.693383,0.369374,0.857486,0.459881,0.766753,0.630626,0.142514,0.6742,0.261972
3,0.608579,0.690157,0.361561,0.859458,0.453622,0.765559,0.638439,0.140542,0.6719,0.256956
4,0.598760,0.693878,0.363246,0.855687,0.452174,0.766334,0.636754,0.144313,0.6724,0.253116
...,...,...,...,...,...,...,...,...,...,...
95,0.594898,0.684662,0.362105,0.848871,0.450188,0.757975,0.637895,0.151129,0.6639,0.242859
96,0.603006,0.698889,0.369247,0.857596,0.458026,0.770151,0.630753,0.142404,0.6772,0.261692
97,0.610512,0.687548,0.358765,0.860431,0.451945,0.764336,0.641235,0.139569,0.6704,0.255604
98,0.588184,0.690992,0.360515,0.849968,0.447031,0.762279,0.639485,0.150032,0.6675,0.242408
