In [2]:
# Imports
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense


# [Metrics](https://keras.io/api/metrics/)

A metric is a function that is used to judge the performance of your model.

Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Note that you may use any loss function as a metric.

Available metrics in the keras library are
1. Accuracy metrics
    1. Accuracy class
    2. BinaryAccuracy class
    3. CategoricalAccuracy class
    4. TopKCategoricalAccuracy class
    5. SparseTopKCategoricalAccuracy class
2. Probabilistic metrics
    1. BinaryCrossentropy class
    2. CategoricalCrossentropy class
    3. SparseCategoricalCrossentropy class
    4. KLDivergence class
    5. Poisson class
3. Regression metrics
    1. MeanSquaredError class
    2. RootMeanSquaredError class
    3. MeanAbsoluteError class
    4. MeanAbsolutePercentageError class
    5. MeanSquaredLogarithmicError class
    6. CosineSimilarity class
    7. LogCoshError class
4. Classification metrics based on True/False positives & negatives
    1. AUC class
    2. Precision class
    3. Recall class
    4. TruePositives class
    5. TrueNegatives class
    6. FalsePositives class
    7. FalseNegatives class
    8. PrecisionAtRecall class
    9. SensitivityAtSpecificity class
    10. SpecificityAtSensitivity class
5. Image segmentation metrics
    1. MeanIoU class
6. Hinge metrics for "maximum-margin" classification
    1. Hinge class
    2. SquaredHinge class
    3. CategoricalHinge class




***
## Introduction

Choosing the right metric is crucial while evaluating machine learning (ML) models; various metrics are proposed to evaluate ML models in different applications. In some applications looking at a single metric may not give you the whole picture of the problem you are solving, and you may want to use a subset of the metrics. We will discuss, a few of the metrics, but remember, there are other metrics exist too.

***


***
## Difference Between Metric and Loss Function

It is also worth mentioning that metric is different from loss function. Loss functions are functions that show a measure of the model performance and are used to train a machine learning model (using some kind of optimization), and are usually **differentiable** in model’s parameters. On the other hand, metrics are used to monitor and measure the performance of a model (during training, and test), and do not need to be differentiable. However if for some tasks the performance metric is differentiable, it can be used both as a loss function (perhaps with some regularizations added to it), and a metric, such as MSE.

***

## Confusion Matrix 

**Remember that confusion matrix is not a mteric, but it is an important concept to learn.**

One of the key concept in classification performance is confusion matrix, also known as error matrix, which is a tabular visualization of the model predictions versus the ground-truth labels. Each row of confusion matrix represents the instances in a predicted class and each column represents the instances in an actual class.

Let’s go through this with an example. Let’s assume we are building a binary classification to classify cat images from non-cat images. And let’s assume our test set has 1100 images (1000 non-cat images, and 100 cat images), with the below confusion matrix.

![ConfusionMatrix](image6.png)


- **Out of 100 cat images**, the model has predicted 90 of them correctly  and has mis-classified 10 of them. If we refer to the “cat” class as positive and the non-cat class as negative class, then 90 samples predicted as cat are considered as as true-positive, and the 10 samples predicted as non-cat are false negative.
- **Out of 1000 non-cat images**, the model has classified 940 of them correctly, and mis-classified 60 of them. The 940 correctly classified samples are referred as true-negative, and those 60 are referred as false-positive.

As we can see diagonal elements of this matrix denote the correct prediction for different classes, while the off-diagonal elements denote the samples which are mis-classified.


***
## Classification metrics based on True/False positives & negatives

### Classification Accuracy

Classification Accuracy is measured using the relationship

\begin{equation}
Accuracy = \frac{Number\ of\ Correct\ Predictions}{Total\ Number\ of\ Prediction}
\end{equation}

In Keras, `tf.keras.metrics.Accuracy(name="accuracy", dtype=None)` can be used to calculate it.

***



### Precision class

There are many cases in which classification accuracy is not a good indicator of your model performance. 

One of these scenarios is when your class distribution is imbalanced (one class is more frequent than others). In this case, even if you predict all samples as the most frequent class you would get a high accuracy rate, which does not make sense at all (because your model is not learning anything, and is just predicting everything as the top class). 

For example in our cat vs non-cat classification above, if the model predicts all samples as non-cat, it would result in a 1000/1100= 90.9%. Therefore we need to look at class specific performance metrics too. **Precision** is one of such metrics, which is defined as:

\begin{equation}
Precision= \frac{True\_Positive}{True\_Positive+ False\_Positive}
\end{equation}

The precision of Cat and Non-Cat class in the above example can be calculated as:

\begin{equation}
Precision\_cat= \frac{samples \ correctly\  predicted \ cat}{samples\ predicted\ as\ cat} = \frac{90}{90+60} = 60\% 
\end{equation}

\begin{equation}
Precision\_NonCat= \frac{940}{950}= 98.9\%
\end{equation}

As we can see the model has much higher precision in predicting non-cat samples, versus cats. This is not surprising, as model has seen more examples of non-cat images during training, making it better in classifying that class.


#### In Code
In keras, we can use `tf.keras.metrics.Precision(thresholds=None, top_k=None, class_id=None, name=None, dtype=None)`. Let's see a few examples:



**Example 1**

In [3]:
m = tf.keras.metrics.Precision()
m.update_state([0, 1, 1, 1], [1, 0, 1, 1])
m.result().numpy()


0.6666667

**Example 2:**

In [5]:
m = tf.keras.metrics.Precision()
m.update_state([0, 1, 1, 1], [1, 0, 1, 1], sample_weight=[0, 0, 1, 0])
m.result().numpy()

1.0

**Example 3:**

In [6]:
m = tf.keras.metrics.Precision(top_k=2)
m.update_state([0, 0, 1, 1], [1, 1, 1, 1])
m.result().numpy()

0.0

**Example 4:**
With `top_k=4`, it will calculate precision over $y\_true[:4]$ and $y\_pred[:4]$


In [7]:
m = tf.keras.metrics.Precision(top_k=4)
m.update_state([0, 0, 1, 1], [1, 1, 1, 1])
m.result().numpy()

0.5

### Recall class

Recall is another important metric, which is defined as the fraction of samples from a class which are correctly predicted by the model as shown below

\begin{equation}
Recall= \frac{True\_Positive}{True\_Positive + False\_Negative}
\end{equation}

Therefore, for our example above, the recall rate of cat and non-cat classes can be found as:

\begin{equation}
Recall_cat= \frac{90}{100}= 90\%\\
Recall_NonCat= \frac{940}{1000}= 94\%
\end{equation}


AUC class
TruePositives class
TrueNegatives class
FalsePositives class
FalseNegatives class
PrecisionAtRecall class
SensitivityAtSpecificity class
SpecificityAtSensitivity class

# References

1. [Metrics](https://keras.io/api/metrics/)
2. [20 Popular Machine Learning Metrics.](https://towardsdatascience.com/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce)