# MMU metrics walkthrough

This notebook briefly demonstrates the various capabilities of the package on the computation of confusion matrix/matrices and binary classification metrics.

In [1]:
import pandas as pd
import numpy as np

import mmu

### Data generation

We generate predictions and true labels where:
* `score`: classifier scores
* `yhat`: estimated labels
* `y`: true labels

In [2]:
score, yhat, y = mmu.generate_data(n_samples=10000)

### Confusion matrix only

We can compute the confusion matrix for a single run using the estimated labels or based on the probability and a classification threshold.

Based on the esstimated labels `yhat`

In [3]:
# based on yhat
mmu.confusion_matrix(y, yhat)

array([[ 830, 2863],
       [1452, 4855]])

based on classifier score with classification threshold

In [4]:
mmu.confusion_matrix(y, score=score, threshold=0.5)

array([[ 830, 2863],
       [1452, 4855]])

## Confusion matrix and metrics

The ``binary_metrics*`` functions compute ten classification metrics:
 *    0 - neg.precision aka Negative Predictive Value
 *    1 - pos.precision aka Positive Predictive Value
 *    2 - neg.recall aka True Negative Rate & Specificity
 *    3 - pos.recall aka True Positive Rate aka Sensitivity
 *    4 - neg.f1 score
 *    5 - pos.f1 score
 *    6 - False Positive Rate
 *    7 - False Negative Rate
 *    8 - Accuracy
 *    9 - MCC

These metrics were chosen as they are the most commonly used metrics and most other metrics can be compute from these. We don't provide individual functions at the moment as the overhead of computing all of them vs one or two is negligable.

This index can be retrieved using:

In [5]:
col_index = mmu.metrics.col_index
col_index

{'neg.precision': 0,
 'neg.prec': 0,
 'npv': 0,
 'pos.precision': 1,
 'pos.prec': 1,
 'ppv': 1,
 'neg.recall': 2,
 'neg.rec': 2,
 'tnr': 2,
 'specificity': 2,
 'pos.recall': 3,
 'pos.rec': 3,
 'tpr': 3,
 'sensitivity': 3,
 'neg.f1': 4,
 'neg.f1_score': 4,
 'pos.f1': 5,
 'pos.f1_score': 5,
 'fpr': 6,
 'fnr': 7,
 'accuracy': 8,
 'acc': 8,
 'mcc': 9}

### For a single test set

In [6]:
cm, metrics = mmu.binary_metrics(y, yhat)

In [7]:
# the confusion matrix
cm

array([[ 830, 2863],
       [1452, 4855]])

We can create a dataframe from the confusion matrix using:

In [8]:
mmu.confusion_matrix_to_dataframe(cm)

Unnamed: 0_level_0,Unnamed: 1_level_0,estimated,estimated
Unnamed: 0_level_1,Unnamed: 1_level_1,negative,positive
observed,negative,830,2863
observed,positive,1452,4855


In [9]:
metrics

array([ 0.36371604,  0.62904898,  0.22474953,  0.76977961,  0.27782427,
        0.69233512,  0.77525047,  0.23022039,  0.5685    , -0.00629139])

We can create a dataframe from the metrics using:

In [10]:
mmu.metrics_to_dataframe(metrics)

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.363716,0.629049,0.22475,0.76978,0.277824,0.692335,0.77525,0.23022,0.5685,-0.006291


### A single run using probabilities

In [11]:
cm, metrics = mmu.binary_metrics(y, score=score, threshold=0.5)

In [12]:
mmu.confusion_matrix_to_dataframe(cm)

Unnamed: 0_level_0,Unnamed: 1_level_0,estimated,estimated
Unnamed: 0_level_1,Unnamed: 1_level_1,negative,positive
observed,negative,830,2863
observed,positive,1452,4855


In [13]:
mmu.metrics_to_dataframe(metrics)

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.363716,0.629049,0.22475,0.76978,0.277824,0.692335,0.77525,0.23022,0.5685,-0.006291


### A single run using multiple thresholds

Can be used when you want to compute a precision-recall curve for example

In [14]:
thresholds = np.linspace(1e-5, 1.0, 1000)

In [15]:
cm, metrics = mmu.binary_metrics_thresholds(
    y=y,
    score=score,
    thresholds=thresholds,
)

The confusion matrix is now an 2D array where the rows contain the confusion matrix for a single threshold

In [16]:
cm

array([[   0, 3693,    0, 6307],
       [   0, 3693,    0, 6307],
       [   0, 3693,    0, 6307],
       ...,
       [3693,    0, 6307,    0],
       [3693,    0, 6307,    0],
       [3693,    0, 6307,    0]])

Similarly, `metrics` is now an 2D array where the rows contain the metrics for a single threshold

In [17]:
mmu.metrics_to_dataframe(metrics)

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.0000,0.6307,0.0,1.0,0.0000,0.773533,1.0,0.0,0.6307,0.0
1,0.0000,0.6307,0.0,1.0,0.0000,0.773533,1.0,0.0,0.6307,0.0
2,0.0000,0.6307,0.0,1.0,0.0000,0.773533,1.0,0.0,0.6307,0.0
3,0.0000,0.6307,0.0,1.0,0.0000,0.773533,1.0,0.0,0.6307,0.0
4,0.0000,0.6307,0.0,1.0,0.0000,0.773533,1.0,0.0,0.6307,0.0
...,...,...,...,...,...,...,...,...,...,...
995,0.3693,0.0000,1.0,0.0,0.5394,0.000000,0.0,1.0,0.3693,0.0
996,0.3693,0.0000,1.0,0.0,0.5394,0.000000,0.0,1.0,0.3693,0.0
997,0.3693,0.0000,1.0,0.0,0.5394,0.000000,0.0,1.0,0.3693,0.0
998,0.3693,0.0000,1.0,0.0,0.5394,0.000000,0.0,1.0,0.3693,0.0


Generate multiple runs for the below functions

In [18]:
score, yhat, y = mmu.generate_data(n_samples=10000, n_sets=100)

### Multiple runs using a single threshold

You have performed bootstrap or multiple train-test runs and want to evaluate the distribution of the metrics you can use `binary_metrics_runs`.

`cm` and `metrics` are now two dimensional arrays where the rows are the confusion matrices/metrics for that a run

In [19]:
cm, metrics = mmu.binary_metrics_runs(
    y=y,
    score=score,
    threshold=0.5,
)

In [20]:
cm[:5, :]

array([[ 878, 2871, 1386, 4865],
       [ 823, 2929, 1408, 4840],
       [ 856, 2922, 1420, 4802],
       [ 870, 2786, 1423, 4921],
       [ 882, 2961, 1357, 4800]])

### Multiple runs using multiple thresholds

You have performed bootstrap or multiple train-test runs and, for example, want to evaluate the different precision recall curves

In [21]:
cm, metrics = mmu.binary_metrics_runs_thresholds(
    y=y,
    score=score,
    thresholds=thresholds,
    fill=1.0
)

The confusion matrix and metrics are now cubes.

For the confusion matrix the:
* row -- the thresholds
* colomns -- the confusion matrix elements
* slices -- the runs

For the metrics:

* row -- thresholds
* colomns -- the metrics
* slices -- the runs

The stride is such that the biggest stride is over the thresholds for the confusion matrix and over the metrics for the metrics.

The argument being that you will want to model the confusion matrices over the runs
and the metrics individually over the thresholds and runs

In [22]:
print('shape confusion matrix: ', cm.shape)
print('strides confusion matrix: ', cm.strides)

shape confusion matrix:  (1000, 4, 100)
strides confusion matrix:  (3200, 8, 32)


In [23]:
print('shape metrics: ', metrics.shape)
print('strides metrics: ', metrics.strides)

shape metrics:  (1000, 10, 100)
strides metrics:  (800, 800000, 8)


In [24]:
pos_recalls = metrics[:, mmu.metrics.col_index['pos.rec'], :]
pos_precisions = metrics[:, mmu.metrics.col_index['pos.prec'], :]

### Binary metrics over confusion matrices

This can be used when you have a methodology where you model and generate confusion matrices

In [25]:
# We use confusion_matrices to create confusion matrices based on some output
cm = mmu.confusion_matrices(
    y=y,
    score=score,
    threshold=0.5,
)

In [26]:
metrics = mmu.binary_metrics_confusion_matrices(cm, 0.0)

In [27]:
mmu.metrics_to_dataframe(metrics)

Unnamed: 0,neg.precision,pos.precision,neg.recall,pos.recall,neg.f1,pos.f1,fpr,fnr,acc,mcc
0,0.387809,0.628878,0.234196,0.778275,0.292034,0.695646,0.765804,0.221725,0.5743,0.014426
1,0.368893,0.622989,0.219350,0.774648,0.275113,0.690590,0.780650,0.225352,0.5663,-0.006981
2,0.376098,0.621699,0.226575,0.771778,0.282788,0.688656,0.773425,0.228222,0.5658,-0.001905
3,0.379416,0.638510,0.237965,0.775694,0.292486,0.700448,0.762035,0.224306,0.5791,0.015647
4,0.393926,0.618477,0.229508,0.779600,0.290036,0.689754,0.770492,0.220400,0.5682,0.010629
...,...,...,...,...,...,...,...,...,...,...
95,0.378191,0.622838,0.231585,0.769194,0.287264,0.688322,0.768415,0.230806,0.5663,0.000895
96,0.382051,0.618799,0.234400,0.766246,0.290543,0.684674,0.765600,0.233754,0.5634,0.000741
97,0.361099,0.623978,0.222222,0.766497,0.275129,0.687934,0.777778,0.233503,0.5637,-0.012975
98,0.374890,0.625227,0.227855,0.772233,0.283438,0.690998,0.772145,0.227767,0.5682,0.000101
