# Statistical Measures of Performance

This notebook is intended to provide a brief introduction to statistical measures of performance (accuracy, precision, recall (sensitivity) and specificity).

These four statistical Measures are key to get a good summary of your results as they give insights based on four indicators:<br>
- True Positive (TP): Your result says "True" and your reference says "True".
- True Negative (TN): Your result says "False" and your reference says "False".
- False Positive (FP): Your result says "True" and your reference says "False".
- False Negative (FN): Your result says "False" and your reference says "True".

It's important to note these statistical measures are for binary classification (as the example above - True/False). If you want to use them in a multiclass problem it is possible but you have to take one class and leave the rest of the classes as if they were in one group (further explanation and example shown later in the text).

## Sensitivity / recall

### Definition
It is the ratio of how much was classified as Positive and how much must have been Positive.<br>
Sensitivity let you know the ratio of positives, it is specially useful in applications in which a True is crucial and prefered over a False result. As an example, in safety functions for autonomous driving if you need to alert in certain circumstances it is prefered to get an incorrect alert that produces safety instead of not having it and needing it. High sensitivity is crucial for safety functions.


### Formula / Procedure to find it

The equation to compute the sensitivity is as follows: $Sensitivity = \frac{TP}{TP + FN}$ <br>

### Visualization

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### Creating the data set

In [None]:
data_set = [5,  2,  6, 10,  9,  3,  1,  9,  6,  2,  1,  4,  6,  5,  0,  4,  6,
            5,  4,  6,  3,  0,  2,  8,  6,  7,  8,  0,  4,  7,  0,  1,  0,  8,
            2,  0, 10,  2,  6,  6,  0,  5,  2,  0, 10,  3,  9,  8,  4,  7,  6,
            1, 10,  7, 10,  3,  0,  6,  5,  8,  4,  3,  7,  3,  1,  5,  3,  0,
            3,  3,  1,  2,  1,  5,  0,  5,  8,  1, 10,  7,  8,  6,  9,  3,  3,
            7,  3,  4, 10,  8,  2,  0,  0,  2,  9,  0,  5,  5,  5,  6]

Imagine that we what an alert every time we get a datapoint equal to 5 or a greater value.<br>
For sensitivity comparisson we create two filters (one incorrect as it doesn't take into account the value 5 only greater numbers and one correct).

In [None]:
def sensitivity(actual, predicted, verbosity=True):
    true_positives = np.sum([a is p for a,p in zip(actual,predicted) if a])
    false_negatives = np.sum([a is not p for a,p in zip(actual,predicted) if a])
    sens = true_positives/(true_positives+false_negatives)
    if verbosity:
        print(f"True positives: {true_positives}\nFalse negatives: {false_negatives}\nSensitivity: {sens}\n")
    return sens

def alert_trigger_non_inclusive(data):
    return np.array([d>5 for d in data])

def alert_trigger_inclusive(data):
    return np.array([d>=5 for d in data])

In [None]:
actual = alert_trigger_inclusive(data_set)
pred = alert_trigger_non_inclusive(data_set)
sensitivity(actual, pred)

We got a sensitivity of 78%
It was really bad only by taking out one limit testing condition.

In [None]:
def alert_trigger_inclusive_threshold(data, thresh):
    return np.array([d>=thresh for d in data])

def threshold(min_t, max_t, data, func):
    df = pd.DataFrame()
    measure = list()
    current_thresh = list()
    actual = alert_trigger_inclusive(data)
    for t in range(min_t,max_t):
        pred = alert_trigger_inclusive_threshold(data, t)
        measure.append(func(actual, pred, verbosity=False))
        current_thresh.append(t)
    df['func'] = measure
    df['threshold'] = current_thresh
    return df

In [None]:
df_sens = threshold(0, 10, data_set, sensitivity)
fig = go.Figure(data=go.Scatter(x=df_sens['threshold'],
                                y=df_sens["func"],
                                text=[f"Threshold: {_}" for _ in df_sens["threshold"]],
                                mode='markers+lines'))
fig.update_layout(title_text="Behaviour of Sensitivity with different thresholds",
                  xaxis=dict(title="Threshold"),
                  yaxis=dict(title="Sensitivity")
                  )

fig.show()

You have to take sensitivity into account if your application is critical and needs True results (prefered).

## Specificity

### Definition
Specificity is the opposite of sensitivity. It is the ratio of correct classification as negative and all the samples that are indeed negative.

### Formula / Procedure to find it

The equation to compute the specificity is as follows: $Specificity = \frac{TN}{FP + TN}$ <br>

### Creating the data set

We will be using the same dataset (previously created).

In [None]:
def specificity(actual, predicted, verbosity=True):
    true_negatives = np.sum([a is p for a,p in zip(actual,predicted) if not a])
    false_positives = np.sum([a is not p for a,p in zip(actual,predicted) if not a])
    spec = true_negatives/(true_negatives+false_positives)
    if verbosity:
        print(f"True negatives: {true_negatives}\nFalse positives: {false_positives}\nSpecificity: {spec}\n")
    return spec

In [None]:
actual = alert_trigger_inclusive(data_set)
pred = alert_trigger_non_inclusive(data_set)
specificity(actual, pred)

In [None]:
df_spec = threshold(0, 10, data_set, specificity)
fig = go.Figure(data=go.Scatter(x=df_spec['threshold'],
                                y=df_spec["func"],
                                text=[f"Threshold: {_}" for _ in df_sens["threshold"]],
                                mode='markers+lines'))
fig.update_layout(title_text="Behaviour of Specificity with different thresholds",
                  xaxis=dict(title="Threshold"),
                  yaxis=dict(title="Specificity")
                  )

fig.show()

As we can see, this classification gets a better score classifing negatives correctly that positives. Getting both is desired and we need a balance between sensitiviy and specificity to get a good classifier but depending on the application you can select the metric needed.<br>
We can get a really useful visualization having these two metrics (Sensitivity and Specificity): The ROC curve (Receiver Operating Characteristic)<br>
But we have to modify our classifier to receive thresholds in the classification.

In [None]:
def alert_trigger_inclusive_threshold(data, thresh):
    return np.array([d>=thresh for d in data])

def threshold_roc(min_t, max_t, data):
    df = pd.DataFrame()
    sen_t = list()
    spe_t = list()
    current_thresh = list()
    actual = alert_trigger_inclusive(data)
    for t in range(min_t,max_t):
        pred = alert_trigger_inclusive_threshold(data, t)
        sen_t.append(sensitivity(actual, pred, verbosity=False))
        spe_t.append(specificity(actual, pred, verbosity=False))
        current_thresh.append(t)
    df['sensitivity'] = sen_t
    df['specificity'] = spe_t
    df['threshold'] = current_thresh
    return df

In [None]:
df = threshold_roc(0, 11, data_set)

fig = go.Figure(data=go.Scatter(x=1-df['specificity'],
                                y=df["sensitivity"],
                                text=[f"Threshold: {_}" for _ in df["threshold"]],
                                mode='markers+lines'))
fig.update_layout(title_text="ROC for integers threshold classification [0,10]",
                  xaxis=dict(title="1 - Specificity"),
                  yaxis=dict(title="Sensitivity")
                  )

fig.show()

As you can see en the plot, while we get closer to the threshold of 5 (the one we know is the right one) we tend to get better results for sensitivity and specificity, reaching the maximum score at threshold 5 with a perfect score of both measures.<br>
In this case we got a perfect score, in real applications we might determine how good our classifier is by determining the best combination of sensitivity and specificity if we don't get a perfect score.

You have to take specificity into account if your application is critical and needs False results (prefered).

## Precision

### Definition
The precision can be seen as the percentage of real True scenarios given the total amounts the classifier considers as True. In other words we can determine in what percentage we can trust the classifier when it gives a True statement.<br>
It is the ratio of True Positives and the sum of True Positives and False Positives.

### Formula / Procedure to find it

The equation to compute the precision is as follows: $Precision = \frac{TP}{TP + FP}$ <br>

In [None]:
def precision(actual, predicted, verbosity=True):
    true_positives = np.sum([a is p for a,p in zip(actual,predicted) if a])
    false_positives = np.sum([a is not p for a,p in zip(actual,predicted) if not a])
    prec = true_positives/(true_positives+false_positives)
    if verbosity:
        print(f"True positives: {true_positives}\nFalse positives: {false_positives}\nPrecision: {prec}\n")
    return prec

In [None]:
actual = alert_trigger_inclusive(data_set)
pred = alert_trigger_non_inclusive(data_set)
precision(actual, pred)

In [None]:
df_prec = threshold(0, 10, data_set, precision)
fig = go.Figure(data=go.Scatter(x=df_prec['threshold'],
                                y=df_prec["func"],
                                text=[f"Threshold: {_}" for _ in df_sens["threshold"]],
                                mode='markers+lines'))
fig.update_layout(title_text="Behaviour of Precision with different thresholds",
                  xaxis=dict(title="Threshold"),
                  yaxis=dict(title="Precision")
                  )

fig.show()

You can get precision to know how much you can trust in a True result from your classifier.

## Accuracy

### Definition
The accuracy gives an overall hint of how good the classification is performed, you have to take all the correct classifications (Trues or Falses) and divide by the number of samples.

### Formula / Procedure to find it

The equation to compute the accuracy is as follows: $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ <br>

In [None]:
def accuracy(actual, predicted, verbosity=True):
    true_positives = np.sum([a is p for a,p in zip(actual,predicted) if a])
    true_negatives = np.sum([a is p for a,p in zip(actual,predicted) if not a])
    false_positives = np.sum([a is not p for a,p in zip(actual,predicted) if not a])
    false_negatives = np.sum([a is not p for a,p in zip(actual,predicted) if a])
    acc = (true_positives + true_negatives)/(true_positives+true_negatives+false_positives+false_negatives)
    if verbosity:
        print(f"True positives: {true_positives}\nTrue negatives: {true_negatives}\n",
              f"False positives: {false_positives}\nFalse negatives: {false_negatives}\nAccuracy: {acc}\n")
    return acc

In [None]:
actual = alert_trigger_inclusive(data_set)
pred = alert_trigger_non_inclusive(data_set)
accuracy(actual, pred)

In [None]:
df_acc = threshold(0, 10, data_set, accuracy)
fig = go.Figure(data=go.Scatter(x=df_acc['threshold'],
                                y=df_acc["func"],
                                text=[f"Threshold: {_}" for _ in df_sens["threshold"]],
                                mode='markers+lines'))
fig.update_layout(title_text="Behaviour of Accuracy with different thresholds",
                  xaxis=dict(title="Threshold"),
                  yaxis=dict(title="Accuracy")
                  )

fig.show()

You can observe that Accuracy is not like the other metrics, you get a peak in the right threshold and the it goes down again.<br>
In conclusion, with accuracy you can get a better idea of the result. If want more details you can use sensitivity and specificity to get the ROC curve and also determine the best threshold that suits your needs.

## Excercise

You can test your learning of the introduction of central tendency measures next:

In [None]:
class test:
    def __init__(self):
        self.questions = list()
        self.answers = list()
        self.correct_answers = 0
        self.score = 0

    def add_element(self, q, a):
        self.questions.append(q)
        self.answers.append(a)

    def remove_element(self, index):
        self.questions.pop(index)
        self.answers.pop(index)
        
    def show_answer(self, index):
        print(f"Q{index}: {self.questions[index-1]} - Ans_{index}: {self.answers[index-1]}")
    
    def show_answers(self):
        for index, (q, a) in enumerate(zip(self.questions, self.answers)):
            print(f"Q{index+1}: {q} - Ans_{index+1}: {a}")
    
    def build_from_csv(self, filename):
        df = pd.read_csv(filename)
        for index in range(df.shape[0]):
            self.add_element(df['Questions'][index], df['Answers'][index])
    
    def visualize_score(self):
        fig = go.Figure(data=[go.Pie(labels=["Correct", "Incorrect"],
                                     values=[self.score, 100-self.score],
                                     marker_colors=['rgb(10,100,10)', 'rgb(230,70,70)'],
                                     hole=.3)])
        fig.show()

    def test(self):
        self.correct_answers = 0
        for index, (q, a) in enumerate(zip(self.questions, self.answers)):
            current_answer = ''
            while len(str(current_answer))==0:
                current_answer = input(f"Q{index+1}: " + q)
                if len(current_answer)>0:
                    current_answer = np.round(float(current_answer),2)
                    self.correct_answers += int(current_answer == a)
                    if a==current_answer:
                        print("Correct")
                    else:
                        print("Incorrect")
        self.score =  100*np.sum(self.correct_answers)/len(self.questions)
        
        print(f"Your score: {self.score}")
        self.visualize_score()

In [None]:
exam = test()
exam.build_from_csv("https://raw.githubusercontent.com/Ricardo-DG/data_analytics_training/main/measures_test.csv")

In [None]:
# If you would like to see the answers uncomment and run the following line

# exam.show_answers()

In [None]:
# If you would like to see a specific answer uncomment and run the following line
# (make sure to replace "index" with the number of the question you want to know the answer).

# exam.show_answer(index)

In [None]:
score = exam.test()