# Metrics
Metrics in machine learning refer to the quantitative measurements used to evaluate the performance of a model. Metrics can be used to assess different aspects of a model's performance, such as accuracy, precision, recall, F1-score, AUC-ROC, and others. Choosing the right metrics for a given problem is important to ensure that the model's performance is appropriately evaluated and improved.

<img src = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT6F58yqU8uacM3fV56Thy1CGgc-qRTBWocOA&usqp=CAU">

|Classification|Further Types|Formula
|---|---|---|
|Accuracy|Accuracy Score|$$\frac {T_P}{n}$$
||Balanced Accuracy Score|$$\frac {Senstivity + Specifisity}{2}$$
||Top K Accuracy|$$argmax[accuracy]$$
|Precison|Average Precision Score|$$\frac {T_P}{F_N}$$

In [1]:
import numpy as np
import pandas as pd 

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Lets assume you are giving a `MCQ(Multiple Choice Question)` test. There are $4$ questions 

## Will a Donkey be able to ride a Horse ??
<img src = "https://i.redd.it/lemeny69kd161.jpg">

## Will the Bank accept Monoploy Money ??
<img src = "https://www.meme-arsenal.com/memes/8b66635dd74ce0925f0414603b210f6f.jpg">

## Will a Dog be able to ride a SuperBike ??
<img src = "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRuBKEDHvyqHRCwFBvf8yW43f7Qj_JBBuXYQ0hjpqkNSMCkrJ5XSQ9brpHVGDj-0tyl9dY&usqp=CAU">

## Will I able to find my Socks ??
<img src = "https://i.pinimg.com/236x/4b/57/1a/4b571a419e9345aaa7798b2a00e87fd9--sock-memes.jpg">

You give the answers like this 

|Question|Answer
|---|---
|Will a Donkey be able to ride a Horse ??|False
|Will the Bank accept Monoploy Money ??|True
|Will a Dog be able to ride a SuperBike ??|True
|Will I able to find my Socks ??|True

But the actual answers were 

|Question|Answer
|---|---
|Will a Donkey be able to ride a Horse ??|True
|Will the Bank accept Monoploy Money ??|True
|Will a Dog be able to ride a SuperBike ??|True
|Will I able to find my Socks ??|False

# 1 | Accuracy 

# 1.1 | Accuracy Score 

So your teacher says that you gave $2$ `correct answers` and $2$ `incorrrect answers`. 

The teacher gave you marks like this $\frac {2}{4} = 0.5$

We can think this of as $\frac {correct_-ones}{total_-ones}$

**Yayyyyyyyyyy!!!!**, We have found a new mathametical formula ðŸ¥³ðŸ¥³. 

*Lets give is a fancy name. Wait !!!. Let me use my rusty brain...*

...

...

...

Okay I got a name, the **Accuracy Score** ðŸ˜ŽðŸ˜Ž

But there is a problem with this one, `it is simple and easy to understand`. Lets make it `difficult and hard to understand`. 

One way is to make a complicated chart like this 

|||Actual Answers|Actual Answers|Total
|---|---|---|---|---|
|||False|True|
|Your Answers|False|$0$|$1$|$0 + 1 = 1$|
|Your Answers|True|$1$|$2$|$1 + 2 = 3$|
|Total||$0+1 = 1$|$1+2=3$|$$\frac {4 + 4}{2} = \frac {8}{2} = 4$$|

We call this **Bad Boy** the `Confusion Matrix`

We introduce more complex terms in this bad boy
* `True Positives` - When both `Actual Answers` and `Your Answers` says $1$
* `True Negatives` - When both `Actual Answers` and `Your Answers` says $0$
* `False Positives` - When`Actual Answers` says $1$ but `Your Answers` says $0$
* `False Negatives` - When`Actual Answers` says $0$ but `Your Answers` says $1$

Also we needlesly update our formula $$Accuracy_-Score = \frac {True_-Positive + True_-Negative}{True_-Positive + True_-Negative + False_-Postive + False_-Negatives}$$

This formula is for the common people. We are the lijhandary inventors of this formula. So we have the privelege to write it as 

$$AS = \frac {T_P + T_N}{T_P + T_N + F_P + F_N}$$

By chance We are also programers. So our second last brain cell asks??. What if we code this **Bad Boy...?**.

Lets try to find out!!!

# 1.1.1 | Accuracy Score From Scratch

This is the list of answers we gave 

In [2]:
our_legendary_answers = ["False" , "True" , "True" , "True"]

And this is the list teacher gave 

In [3]:
teacher_answers = ["True" , "True" , "True" , "False"]

We intialize a variable `Correct_ones`, to keep track which ones were the correct ones 

In [4]:
correct_ones = 0

Now we run a for loop and check if we got the correct ones 

In [5]:
for our_answers , teacher_ones in zip(our_legendary_answers , teacher_answers):
    if our_answers == teacher_ones:
        correct_ones += 1

In [6]:
correct_ones

2

And that is right we gave $2$ correct answers. Now we just need to divide this by the total number of questions 

In [7]:
correct_ones/len(teacher_answers)

0.5

And this is our **Accuracy Score** for the test. 

By chance we are also kind people. So we think of other people too. So we will make a user friendly function for other users. 

The function will take `actuals` and `predictions` $2$ lists, and spit out the `accuracy_score`. 

# 1.1.2 | Accuracy Score Final Source Code

In [8]:
def accuracy_score(actuals , predictions):
    
    correct_ones = 0
    
    for actual  , predicted in zip(actuals , predictions):
        
        if predicted == actual:
            
            correct_ones += 1

    asc = correct_ones / len(predictions)

    return asc

**Jokes apart, this was actually a small surface depiction of how the [sklearn.metrics.accuracy_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) works internally. Kudos to them !!!**

# 1.2 | Balanced Accuracy Score 

Suppose you have the `our_legendary_answers` list as 

In [9]:
our_legendary_answers = ["True" , "True" , "True" , "True" , "True" , "True"]

And `teacher_answers` as 

In [10]:
teacher_answers = ["True" , "True" , "True" , "True" , "True" , "False"]

If we calculate the `accuracy_score` for this, we get 

In [11]:
accuracy_score(teacher_answers , our_legendary_answers)

0.8333333333333334

And we have a really great accuracy. beacuse we were correct most of the time. 

But notice one thing, rather than actually learning, we acutally just gave the same answer every time. Means in reality we actually didnt learn anything. This happened because the data was not varried. The number of `False` was way less than `True`.  

**The drawback of `accuracy_score` was it was not able to handle imbalanced data**

To comprehend this, we introduce a more complex metric. 

**The Lijhandary Balance Accuracy Score**ðŸ¤¯ðŸ¤¯

So the original formula for a particular label is $$as_{True} = \frac {Number_-Of_-times_-True_-was_-Predicited}{Total_-values}$$

We can change this formula a little bit $$as_{True} = \frac {Number_-Of_-times_-True_-was_-Predicited}{Number_-Of_-times_-True_-was_-Predicited + Number_-Of_-times_-True_-was_-not_-Predicited}$$

$$as_{True} = \frac {Number_-Of_-times_-True_-was_-Predicited}{Number_-Of_-times_-True_-was_-Predicited + Number_-Of_-times_-Other_-value_-was_-falsely_-Predicited}$$

And the same formula with some tweeks will work for `False` too

We can wirte this formula as 

$$as_{True} = \frac {T_P}{T_P + F_N}$$
$$as_{False} = \frac {T_N}{T_N+ F_P}$$

Actually we do not call this `accuracy_score` or `as`, we call it `recall`

$$recall_{True} = \frac {T_P}{T_P + F_N}$$
$$recall_{False} = \frac {T_N}{T_N+ F_P}$$

To calculate the `balanced_accuracy_score` we take the mean of all of these values

Now lets try to code this thing 

First lets transition from `list` to `arrays`. as it gives us more functionalities.

# 1.2.1 | Balanced Accuracy Score from Scratch 

In [12]:
predictions = np.array(our_legendary_answers)
actuals = np.array(teacher_answers)

Now we know that we need to compute `recall` for every unique label we have. First lets get what are the unique values 

In [13]:
np.unique(actuals)

array(['False', 'True'], dtype='<U5')

So know just make a list `recall_list` that stores the values of each `recall` 

In [14]:
recall = []
for labels in np.unique(actuals):
    recall.append(labels)

There are two types of terms in the formula `Number Of times True was Predicited` and `Number Of times Other value was falsely Predicited`. which can be calcualted easily by the code we wrote for `accuracy_score`, we just need to add a little bit more fucntionality to that one 

In [15]:
t_p = 0
f_n = 0
for act , pred in zip(actuals , predictions):
    if act == np.unique(actuals)[0] and pred == np.unique(actuals)[0]:
        t_p += 1 
    else :
        f_n += 1

In [16]:
print(f_n)
t_p 

6


0

We can store these values into a dictionary for more generalised form like `{labsl : T_P , F_N}`

But there is problem with this, at sometime we need to iterate through this thing `dict.values = dict_values{[T_P , F_N] , [T_N , F_P]}`. which is not possible, we cannot iterate through a dictionary. 
```
subscirpt `dictionary` is not iterable
```

Thus we will move to creating `arrays`. 

The array will be of format `array([[lable(encoded) , T_P , F_N] , [lable(encoded) , T_N , F_P]])`

In [17]:
np.empty(shape = (2 , len(np.unique(actuals)) + 1))

array([[5.05771898e-310, 0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000]])

Now we will populate our array

In [18]:
metrics = np.empty(shape = (2 , len(np.unique(actuals)) + 1))
for i in range(len(np.unique(actuals))):
    x = np.empty(shape = (3))
    true = 0
    false = 0
    for act , pred in zip(actuals , predictions):
        if ((act == np.unique(actuals)[i]) and (pred == np.unique(actuals)[i])):
            true += 1
        else :
            false += 1
    x[0] = i
    x[1] = true
    x[2] = false
    metrics[i] = x

In [19]:
x

array([1., 5., 1.])

This denotes the 

|||
|---|---|
|$Label$|$1$|
|$T_P$|$5$|
|$T_N$|$1$

In [20]:
metrics

array([[0., 0., 6.],
       [1., 5., 1.]])

And this is just the broader version of this thing 

Now we just need to calculate this fomula $$as_{True} = \frac {Number_-Of_-times_-True_-was_-Predicited}{Number_-Of_-times_-True_-was_-Predicited + Number_-Of_-times_-Other_-value_-was_-falsely_-Predicited}$$

and luckily we have the corresponding values too 

In [21]:
recall_1 = metrics[0][1] / (metrics[0][1] + metrics[1][2])

In [22]:
recall_1

0.0

And the recall for the second one will be 

In [23]:
recall_0 = metrics[1][1] / (metrics[1][1] + metrics[0][2])

In [24]:
recall_0

0.45454545454545453

And the mean of these values will be 

In [25]:
(recall_1 + recall_0) / 2

0.22727272727272727

And thats actually very low. Lets now put of all this into a funciton and do fun with the homeis then 

In [26]:
recall = []
if metrics.shape[0] == 2:
    recall.append(metrics[0][1] / (metrics[0][1] + metrics[1][2]))
    recall.append(metrics[1][1] / (metrics[1][1] + metrics[0][2]))
else :
    # If the labels are more than 2
    for i in metrics:
        trans_metrics = metrics[:m]+metrics[m+1:]
        print(metrics[:m] , metrics[m+1:])
        print(trans_metrics)
        x = [j[2] for j in trans_metrics]
        print(np.sum(x))
        recall.append(metrics[m][1] / np.sum(np.array(x)))

ShortCut for caluclating the `mean`

In [27]:
recall = np.sum(np.array(recall)).mean

In [28]:
balanced_accyracy_score = recall

# 1.2.2 | Balanced Socre Final Source Code

In [29]:
def balanced_accuracy_score(predictions , actuals):

    metrics = np.empty(shape = (2 , len(np.unique(pred)) + 1))
    
    for uniques in range(len(np.unique(pred))):
        
        x = np.empty(shape = (3))
        
        trues = 0
        falses = 0
        
        for actu , predi in zip(act , pred):
            
            if ((actu == np.unique(pred)[uniques]) and (predi == np.unique(pred)[uniques])):
            
                trues += 1
            
            else :
            
                falses += 1
        
        x[0] = uniques
        x[1] = trues
        x[2] = falses
        
        metrics[i] = x
    
    recall = []
    
    if metrics.shape[0] == 2:
    
        recall.append(metrics[0][1] / (metrics[0][1] + metrics[1][2]))
        recall.append(metrics[1][1] / (metrics[1][1] + metrics[0][2]))
    
    else :
    
        for labels in range(len(metrics)): 
            
            x = [j[2] 
                 for j in metrics[:labels]+metrics[labels+1:]]
            
            recall.append(metrics[labels][1] / np.sum(np.array(x)))
    
    return np.sum(np.array(recall))

# 1.3 | Top K Accuracy 

lets assume we do not have $2$ classes and have more classes than $2$, and you want to get the accuracy of best $k$ elements, then we use this type of `accuracy`. 

Implemeting this is really simple.

We know for calculating the accuracy we have this code 

In [30]:
def accuracy_score(actuals , predictions):
    
    correct_ones = 0
    
    for actual  , predicted in zip(actuals , predictions):
        
        if predicted == actual:
            
            correct_ones += 1

    asc = correct_ones / len(predictions)

    return asc

Lets first change the name 

In [31]:
def top_k_accuracy(actuals , predictions):
    
    correct_ones = 0
    
    for actual  , predicted in zip(actuals , predictions):
        
        if predicted == actual:
            
            correct_ones += 1

    asc = correct_ones / len(predictions)

    return asc

First we need to make a dicitonary that stores the `keys` as the `labels` and the `values` as the `accuracy for that particular class`. For now the function returns a list containing the accuracy of all the classes 

In [32]:
def top_k_accuracy(actuals , predictions):
    
    k_labels = []
    
    for label in np.unique(pred):
        
        correct_ones = 0

        for actual  , predicted in zip(actuals , predictions):

            if predicted == label and actual == label:

                correct_ones += 1

        asc = correct_ones / len(predictions)
        
        k_labels.append(asc)

    return k_labels    

Now we need to sort the list and return the first `k` elements of the list 

# 1.3.1 | Top K Accuracy Final Soruce Code

In [33]:
def top_k_accuracy(actuals , predictions , k):
    
    k_labels = []
    
    for label in np.unique(pred):
        
        correct_ones = 0

        for actual  , predicted in zip(actuals , predictions):

            if predicted == label and actual == label:

                correct_ones += 1

        asc = correct_ones / len(predictions)
        
        k_labels.append(asc)

    k_labels = np.array(k_labels)
    
    return k_labels[:k]    

# 2 | Average precision Score

Remeber how we calculated the `accuracy` of the sample...?

The formula for calculating the accuracy we used the formula 

$$\frac {correct_-ones}{total_-ones}$$

For calculating the precision we use the formula 

$$\frac {correct_-ones}{false_-ones}$$

To caluclate the `average precison score` we can change this code a bit diffrently 

# 2.1 | Average Precision Score Final Soruce Code

In [34]:
def average_precision_score(actuals , predictions):
    
    correct_ones = 0
    false_ones = 0
    
    for actual  , predicted in zip(actuals , predictions):
        
        if predicted == actual:
            
            correct_ones += 1
        
        else:
            
            false_ones += 1
            
    aps = correct_ones / false_ones

    return aps