# Confusion Matrix, Accuracy, Specificity

## The Theory

### Confusion Matrix

The confusion matrix is basically a table mapping the prediction of the model vs what the actual result was. It allows us to calculate various metrics used for measuring accuracy in prediction of the classification model.

|                 | Predicted Positive | Predicted Negative |
| --------------- | ------------------ | ------------------ |
| Actual Positive | TP                 | FN (Type I Error)  |
| Actual Negative | FP (Type II Error) | TN                 |

### Error Measurements

$ Accuracy = \frac {TP + TN} {Total Samples}$ \
\
$ Recall = \frac {TP} {TP + FN} $ \
\
$ Precision = \frac {TP} {TP + FP} $ \
\
$ Specificity = \frac {TN} {FP + TN} $ \
\
$ F1 Score = 2 \cdot \frac {Precision * Recall} {Precision + Recall} $

### Multiple Class Error Metrics

Similar to binary metrics.

|                | Predicted Class 1 | Predicted Class 2 | Predicted Class 3 |
| -------------- | ----------------- | ----------------- | ----------------- |
| Actual Class 1 | TP1               | FN (Type I Error) |                   |
| Actual Class 2 | FP (Type II Error) | TP2              |                   |
| Actual Class 3 | FP (Type II Error) | TN               | TP 3              |

$ accuracy = \frac {TP1 + TP2 + TP3} {Total} $

## In Practice

In [32]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, \
                            f1_score, roc_auc_score, \
                            confusion_matrix, roc_curve, \
                            precision_recall_curve
import numpy as np

Below I've created some mock y_true and y_pred values to visualize the results.

In [67]:
def print_precision_recall_and_f1_scores(y_true: list[int], y_pred: list[int]):
    """_summary_

    Args:
        y_true (list[int]): _description_
        y_pred (list[int]): _description_
    """
    print(f"num FP: {np.sum(np.logical_and(y_true == 0, y_pred == 1))}")    
    print(f"num FN: {np.sum(np.logical_and(y_true == 1, y_pred == 0))}")
    print(f"accuracy: {accuracy_score(y_true, y_pred)}")
    print(f"precision: {precision_score(y_true, y_pred)}")
    print(f"recall: {recall_score(y_true, y_pred)}")
    print(f"f1_score: {f1_score(y_true, y_pred, average='binary')}")

In [75]:
y_true = np.array([1, 0, 1, 1, 1, 0, 1, 0, 1])
y_pred1 = np.array([1, 0, 1, 1, 1, 0, 1, 0, 1])
y_pred2 = np.array([0, 0, 1, 1, 0, 0, 0, 1, 1])
y_pred3 = np.array([1, 1, 1, 1, 1, 1, 1, 1, 0])

### Precision, Recall and F1_Scores

In [76]:
print_precision_recall_and_f1_scores(y_true, y_pred1)

num FP: 0
num FN: 0
accuracy: 1.0
precision: 1.0
recall: 1.0
f1_score: 1.0


In [77]:
print_precision_recall_and_f1_scores(y_true, y_pred2)

num FP: 1
num FN: 3
accuracy: 0.5555555555555556
precision: 0.75
recall: 0.5
f1_score: 0.6


In [78]:
print_precision_recall_and_f1_scores(y_true, y_pred3)

num FP: 3
num FN: 1
accuracy: 0.5555555555555556
precision: 0.625
recall: 0.8333333333333334
f1_score: 0.7142857142857143


Whats interesting about the above results is that the accuracy scores are the same for y_pred2 and y_pred3, but the precision, recall and f1_scores are different. Also, the f1_score calculation seems to apply a greater weight to False Negatives than False Positives

### Precision Recall Curve

In [79]:
precision_recall_curve(y_true, y_pred1)

(array([0.66666667, 1.        , 1.        ]),
 array([1., 1., 0.]),
 array([0, 1]))

In [80]:
precision_recall_curve(y_true, y_pred2)

(array([0.66666667, 0.75      , 1.        ]),
 array([1. , 0.5, 0. ]),
 array([0, 1]))

In [81]:
precision_recall_curve(y_true, y_pred3)

(array([0.66666667, 0.625     , 1.        ]),
 array([1.        , 0.83333333, 0.        ]),
 array([0, 1]))