# <p style='text-align: center;'> Evaluation Metrics for Classification </p>

In a classification problem, the category or classes of data is identified based on training data. The model learns from the given dataset and then classifies the new data into classes or groups based on the training. It predicts class labels as the output, such as Yes or No, 0 or 1, Spam or Not Spam, etc. To evaluate the performance of a classification model, different metrics are used, and some of them are as follows :

   - Accuracy
   - Confusion Matrix
   - Precision
   - Recall
   - F-Score
   - AUC(Area Under the Curve)-ROC
   

## I. Accuracy :
- The accuracy is used to find the portion of correctly classified values. It can be determined as the number of correct predictions to the total number of predictions.
    
    
- Accuracy is used when the True Positives and True Negatives are more important. Accuracy is a better metric for Balanced Data.


- It can be formulated as :
    
    
                       # Number of Correct Predictions
           Accuracy = ---------------------------------
                        # Total Number of Predictions   
    
    
                            TP + TN
           Accuracy  = -------------------
                       (TP + TN + FP + FN)
                       
                       
- It is good to use the Accuracy metric when the target variable classes in data are approximately balanced. For example, if 60% of classes in a fruit image dataset are of Apple, 40% are Mango. In this case, if the model is asked to predict whether the image is of Apple or Mango, it will give a prediction with 97% of accuracy.


- It is recommended not to use the Accuracy measure when the target variable majorly belongs to one class. For example, Suppose there is a model for a disease prediction in which, out of 100 people, only five people have a disease, and 95 people don't have one. In this case, if our model predicts every person with no disease (which means a bad prediction), the Accuracy measure will be 95%, which is not correct.


## II. Misclassification Rate (Error Rate) :
- The Misclassification Rate is a performance metric that tells you the fraction of the predictions that were wrong, without distinguishing between positive and negative predictions.


- The formula for calculating Misclassification Rate (Error Rate) is given below :


                       # Number of incorrect Predictions
           Accuracy = ---------------------------------
                        # Total Number of Predictions   
    
    
                            FP + FN
           Accuracy  = -------------------
                       (TP + TN + FP + FN)


## III. Confusion Matrix :
- A confusion matrix is a tabular representation of prediction outcomes of any binary classifier, which is used to describe the performance of the classification model on a set of test data when true values are known.


- A confusion matrix is a two-by-two table that tells us the rate of false positives, false negatives, true positives and true negatives for a test or predictor.


- We can make a confusion matrix if we know both the predicted values and the actual (true) values for a sample set.


- A confusion matrix is a table in which predictions are represented in columns and actual status is represented by rows. Sometimes this is reversed, with actual instances in rows and predictions in columns.


- The table is an extension of the confusion matrix in predictive analytics, and makes it easy to see whether mislabeling has occurred and whether the predictions are more or less correct.


- A confusion matrix is also known as an error matrix, and it is a type of contingency table.


## Terminology Related to a Confusion Matrix :

- Suppose your confusion matrix is a simple 2 by 2 table, given by :

![image-2.png](attachment:image-2.png)

<b> We can determine the following from the above matrix :
- In the matrix, columns are for the prediction values, and rows specify the Actual values. Here Actual and prediction give two possible classes, Yes or No. So, if we are predicting the presence of a disease in a patient, the Prediction column with Yes means, Patient has the disease, and for NO, the Patient doesn't have the disease.
    
    
- In this example, the total number of predictions are 165, out of which 110 time predicted yes, whereas 55 times predicted No.
    
    
- However, in reality, 60 cases in which patients don't have the disease, whereas 105 cases in which patients have the disease.
    
    
<b> In general, the table is divided into four terminologies, which are as follows :
1. True Positive(TP): In this case, the prediction outcome is true, and it is true in reality, also. 
2. True Negative(TN): in this case, the prediction outcome is false, and it is false in reality, also.
3. False Positive(FP): In this case, prediction outcomes are true, but they are false in actuality.
4. False Negative(FN): In this case, predictions are false, and they are true in actuality.

## IV. Precision :
- Accuracy is generally misleading and is not enough to assess the performance of a classifier, that's why precision used.


- The precision metric is used to overcome the limitation of Accuracy.


- The precision determines the proportion of positive prediction that was actually correct. It can be calculated as the True Positive or predictions that are actually true to the total positive predictions (True Positive and False Positive).

    
    
- Whenever False Positive is much more important use Precision. If we want to minimize the false positive, then precision should be close to 100%, i.e. if we maximize precision, it will minimize the FP errors.


- The formula for calculating Precision is given below :
    
    
                               TP
           Precision = -------------------
                            (TP + FP)
                            
                            
  

## V. Recall or Sensitivity :
- It is also similar to the Precision metric; however, it aims to calculate the proportion of actual positive that was identified incorrectly. It can be calculated as True Positive or predictions that are actually true to the total number of positives, either correctly predicted as positive or incorrectly predicted as negative (true Positive and false negative).


- Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly.
    
    
- Recall is a useful metric in cases where False Negative trumps False Positive.
    
    
- Whenever False Negative is much more important use Recall. If we want to minimize the false negative, then, Recall should be as near to 100%, i.e. if we maximize recall, it will minimize the FN error.


- The formula for calculating Recall is given below :


                           TP
           Recall = -------------------
                        (TP + FN)




## VI. F-Scores (F1-Scores) :
- the F1 Score can be calculated as the harmonic mean of both precision and Recall, assigning equal weight to each of them.


- The harmonic mean is appropriate if the data values are ratios of two variables with different measures, called rates. The arithmetic mean is not applicable on percentage or ratio that's why harmonic mean is used.


- The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall. We use harmonic mean because it is not sensitive to extremely large values, unlike simple averages.

    
- F1 score sort of maintains a balance between the precision and recall for your classifier. If your precision is low, the F1 is low and if the recall is low again your F1 score is low.
    
    
- F1-Score is used when the False Negatives and False Positives are important. F1-Score is a better metric for Imbalanced Data.


- The formula for calculating the F1 score is given below :
    
    
                       2 * Precision * Recall
           F1-Score = ------------------------
                         Precision + Recall   
                         
                         
<b> When to use F-Score?
- As F-score make use of both precision and recall, so it should be used if both of them are important for evaluation, but one (precision or recall) is slightly more important to consider than the other. For example, when False negatives are comparatively more important than false positives, or vice versa.


## VII. AUC-ROC :
- Sometimes we need to visualize the performance of the classification model on charts; then, we can use the AUC-ROC curve. It is one of the popular and important metrics for evaluating the performance of the classification model.


- The AUC-ROC curve results were considered excellent for AUC values between 0.9-1.0, good for AUC values between 0.8-0.9, fair for AUC values between 0.7-0.8, poor for AUC values between 0.6-0.7 and failed for AUC values between 0.5-0.6


- The ROC stands for Receiver Operating Characteristic curve. ROC represents a graph to show the performance of a classification model at different threshold levels. The curve is plotted between two parameters, which are:

   - True Positive Rate (TPR)
   - False Positive Rate (FPR)
   
   
- TPR or true Positive rate is a synonym for Recall, hence can be calculated as :


                        TP
           TPR = -------------------
                     (TP + FN)
                     
                     
- FPR or False Positive Rate can be calculated as  :
                     
                     
                     
                        FP
           FPR = ------------------- = 1- TNR
                     (FP + TN)
                     
                     
<b> Here TNR and FNR are not required to calculate the ROC curve, but for understanding purpose i am exploring :
- TNR or true Negative rate is a synonym for Recall -ve **(Specificty)**, hence can be calculated as :


                        TN
           TNR = -------------------
                     (TN + FP)
    
    
    
- FNR or False Negative Rate can be calculated as  :
                     
                     
                     
                        FN
           FPR = ------------------- = 1 - TPR
                     (FN + TP)
    
    
- To calculate value at any point in a ROC curve, we can evaluate a logistic regression model multiple times with different classification thresholds, but this would not be much efficient. So, for this, one efficient method is used, which is known as AUC.
    
    
## AUC: Area Under the ROC curve :
- AUC is known for Area Under the ROC curve. As its name suggests, AUC calculates the two-dimensional area under the entire ROC curve, as shown below image :
                     
![image.png](attachment:image.png)
    
    
- AUC calculates the performance across all the thresholds and provides an aggregate measure. The value of AUC ranges from 0 to 1. It means a model with 100% wrong prediction will have an AUC of 0.0, whereas models with 100% correct predictions will have an AUC of 1.0.
    
    
<b> When to Use AUC :
- AUC should be used to measure how well the predictions are ranked rather than their absolute values. Moreover, it measures the quality of predictions of the model without considering the classification threshold.

    
<b> When not to use AUC :

- As AUC is scale-invariant, which is not always desirable, and we need calibrating probability outputs, then AUC is not preferable. Further, AUC is not a useful metric when there are wide disparities in the cost of false negatives vs. false positives, and it is difficult to minimize one type of classification error.
                     