# **Classification Measures**

In case of regression, when we wanted to find how good or bad our model performed, we found out the score $R^2$, using the coefficient of determination.
In this lecture, we shall explore the various methods for **Classification Algorithms**

Perhaps the most obvious solution that comes in mind is the use of **accuracy**.

### **Accuracy**

Accuracy is either the fraction or the count of correct predictions made.
If the entire set of predicted labels for a sample strictly match with the true set of labels, then the subset accuracy is 1.0; otherwise it is less than 1.0.
Lets take a look at an example.

In [None]:
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]

It can be easily seen, that among the 4 values above, only 2 have been predicted correctly. Hence, we have achieved an accuracy of 0.5.

We may also say that, we have achieved 50% accuracy. 

Sklearn offers a function to find the accuracy.

In [None]:
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
print("Score :", accuracy_score(y_true, y_pred))

Score : 0.5


In [None]:
print("Percentage :", accuracy_score(y_true, y_pred) * 100, "%")

Percentage : 50.0 %


If you want to find out the exact number of correct predictions, you may use the normalise parameter.

In [None]:
accuracy_score(y_true, y_pred, normalize=False)

2

**Why accuracy isn't always the best option?**

Suppose you have a highly skewed dataset. Suppose a dataset has 100 datapoints, with 95 zeros and 5 ones.

In [None]:
y_true = [0]*95 + [1]*5

Now, lets assume that our ML model always predicts the answer to be 0, no matter what input it has been given. Hence, it will give us y_pred as:


In [None]:
y_pred = [0]*100

In [None]:
print("Score :", accuracy_score(y_true, y_pred))

Score : 0.95


We are getting a 95% accuracy, which suggest that our model is quite good, but we know that this isn't the case, because our model will predict 0 for any value given, and hence is a very bad model.

This is why accuracy isn't always the most elegant way to find how good our ML model actually is.

### **Confusion Matrix**
Confusion matrix is useful in that we can assess how many predictions the model got right, and we understand that the model is performing in this particular way so we can think about how we can further improve our model.</p>
<p>There are some terms that one must know regarding confusion matrices.</p>
<ol>
    <li><b>True Positives:</b> This is the number of samples predicted positive which were actually positive.</li>
    <li><b>True Negatives:</b> This is the number of samples predicted negative which were actually negative.</li>
    <li><b>False Positives:</b> This is the number of samples predicted positive which were <b>not</b> actually positive.</li>
    <li><b>False Negatives:</b> This is the number of samples predicted negative which were <b>not</b> actually negative.</li>
</ol>
<p>In the case of multi-class classification, however, the confusion matrix shows the number of samples predicted correctly and wrongly for each class instead of true positives etc.</p>    

Lets see how confusion matrix helps us out.

In [None]:
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_true,y_pred))

[[95  0]
 [ 5  0]]


So, from the above matrix we can easily understand something is wrong with our model as it is never predicting Class 1.

Let's look at another example: **The Iris Dataset**

In [None]:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [None]:
iris = datasets.load_iris()

In [None]:
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state = 1)

In [None]:
clf = LogisticRegression(solver = 'liblinear')
clf.fit(x_train, y_train)
y_train_pred = clf.predict(x_train)
y_test_pred = clf.predict(x_test)

In [None]:
confusion_matrix(y_train, y_train_pred)

array([[37,  0,  0],
       [ 0, 28,  6],
       [ 0,  0, 41]])

**Inference:** From the above confusion matrix formed on the training data, we notice the following things.

Class 0 had 37 datapoints, and all of them were labelled correctly.

Class 1 had 34 datapoints, out of which 28 were predicted correctly and 6 were mislabelled as "Class 2".

Class 2 had 41 datapoints, and all of them were labelled correctly.

In [None]:
confusion_matrix(y_test, y_test_pred)

array([[13,  0,  0],
       [ 0, 10,  6],
       [ 0,  0,  9]])

You can draw your own inferences for the testing data in similar way.

### **Classification Report**

There are measures other than the confusion matrix which can help achieve better understanding and analysis of our model and its performance. We talk about two particular measures here - precision and recall.



**Precision**

Precision defines the percentage of samples with a certain predicted class label actually belonging to that class label. 

**Recall**

Recall defines the percentage of samples of a certain class which were correctly predicted as belonging to that class.

Note that precision and recall will be defined per class label, not for the dataset as a whole. 

However, how do we choose between precision and recall? Which one is a better metric - precision or recall? Turns out, we can use a better metric which combines both of these - **the F1 score.** 

**F1 Score**

The F1 score is defined as the harmonic mean of precision and recall, and is a far better indicator of model performance than precision and recall individually (usually).

Lets look at the report for the Iris Dataset itself.

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_train, y_train_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        37
           1       1.00      0.82      0.90        34
           2       0.87      1.00      0.93        41

    accuracy                           0.95       112
   macro avg       0.96      0.94      0.95       112
weighted avg       0.95      0.95      0.95       112



Lets look deeper into the report of testing data

In [None]:
print(confusion_matrix(y_test, y_test_pred))
print(classification_report(y_test, y_test_pred))

[[13  0  0]
 [ 0 10  6]
 [ 0  0  9]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       1.00      0.62      0.77        16
           2       0.60      1.00      0.75         9

    accuracy                           0.84        38
   macro avg       0.87      0.88      0.84        38
weighted avg       0.91      0.84      0.84        38



We predicted 13 values to belong to **Class 0**, out of which all 13 were labelled correctly by us, hence precision is $\frac{13}{13} = 1$.

We had 13 values in **Class 0**, and we recalled all of them, hence recall is $\frac{13}{13} = 1$.




We predicted 10 values to belong to **Class 1**, out of which all 10 were labelled correctly by us, hence precision is $\frac{10}{10} = 1$.

We had 16 values in **Class 1**, and we recalled 10 of them, hence recall is $\frac{10}{16} = 0.62$.

We predicted 15 values to belong to Class 2, out of which 9 were labelled correctly by us, hence precision is $\frac{9}{15} = 0.60$.

We had 9 values in Class 2, and we recalled all of them, hence recall is $\frac{9}{9} = 1$.

## **Some other metrics in Sklearn**

### **Precision**

The precision is the ratio $t_p / (t_p + f_p)$ where $t_p$ is the number of true positives and $f_p$ the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The best value is 1 and the worst value is 0.

In [None]:
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
from sklearn.metrics import precision_score
precision_score(y_true, y_pred, average='micro')

0.3333333333333333

There a a few more options for the 'average' parameter. You may refer to the documentation.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html