# <p style='text-align: center;'> Confusion Matrix </p>

## Confusion Matrix :
- A confusion matrix is a two-by-two table that tells us the rate of false positives, false negatives, true positives and true negatives for a test or predictor.


- We can make a confusion matrix if we know both the predicted values and the actual (true) values for a sample set.


- A confusion matrix is a table in which predictions are represented in columns and actual status is represented by rows. Sometimes this is reversed, with actual instances in rows and predictions in columns.


- The table is an extension of the confusion matrix in predictive analytics, and makes it easy to see whether mislabeling has occurred and whether the predictions are more or less correct.


- A confusion matrix is also known as an error matrix, and it is a type of contingency table.


## Terminology Related to a Confusion Matrix :

- Suppose your confusion matrix is a simple 2 by 2 table, given by :

![image-3.png](attachment:image-3.png)




- In the above confusion matrix table which predictions are represented in rows and actual status is represented by columns.


**True Positive (TP) :**

- The number of times our actual positive values are equal to the predicted positive. You predicted a positive value, and it is correct. 
- Here Actual and predicted both are same, the Actual and predicted both are "Accepted Ho".
- The probability of the True Positive (TP) = 1 - β (Beta) and its value lies between 0 and 1.


**False Positive or FP (Type I error) :**
- The number of times our model wrongly predicts negative values as positives. You predicted a negative value, and it is actually positive.
- Here Actual is "Accept Ho" but prediction is "Reject Ho".
- Is the incorrect rejection of the null hypothesis.
- The probability of the False Positive (FP) = α (alpha).
- Maximum probability is set in advance as alpha.


**True Negative (TN) :**
- The number of times our actual negative values are equal to predicted negative values. You predicted a negative value, and it is actually negative.
- Here Actual and predicted both are not same, the Actual and predicted both are "Rejected Ho".
- The probability of the True Negative (TN) = 1 - α (alpha).
- The TP and TN are directly proportional. Increasing the TP increases the TN.


**False Negative or FN (Type II error) :**
- The number of times our model wrongly predicts negative values as positives. You predicted a positive value, and it is actually negative.
- Here Actual is "Reject Ho" but prediction is "Accept Ho".
- Is the incorrect acceptance of the null hypothesis.
- probability is beta, beta depends upon sample size and alpha.


## Classification Measure :
<b> The performance of our model is not very clear. To find how accurate our model is, we use the following metrics or Measures :
    
**Accuracy :**    
- The accuracy is used to find the portion of correctly classified values. It tells us how often our classifier is right. It is the sum of all true values divided by total values.
    
    
- Accuracy is used when the True Positives and True Negatives are more important. Accuracy is a better metric for Balanced Data.
    
    
                       # Correct Predictions
           Accuracy = -----------------------
                          # Total Cases    
    
    
                            TP + TN
           Accuracy  = -------------------
                       (TP + TN + FP + FN)
    
    
**Precision :**
- Precision is used to calculate the model's ability to classify positive values correctly. It is the true positives divided by the total number of predicted positive values. 
    
    
- Whenever False Positive is much more important use Precision.
    
    
                               TP
           Precision = -------------------
                            (TP + FP)
    
    
**Recall or Sensitivity :**    
- It is a measure of actual observations which are predicted correctly, i.e. how many observations of positive class are actually predicted as positive. It is also known as Sensitivity. Recall is a valid choice of evaluation metric when we want to capture as many positives as possible.
    
    
- Recall is defined as the ratio of the total number of correctly classified positive classes divide by the total number of positive classes. Or, out of all the positive classes, how much we have predicted correctly.
    
    
- Recall is a useful metric in cases where False Negative trumps False Positive.
    
    
- Whenever False Negative is much more important use Recall.
    
    
                           TP
           Recall = -------------------
                        (TP + FN)
    
 
**F-measure or F1-Score :**
- The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall. We use harmonic mean because it is not sensitive to extremely large values, unlike simple averages.

    
- F1 score sort of maintains a balance between the precision and recall for your classifier. If your precision is low, the F1 is low and if the recall is low again your F1 score is low.
    
    
- F1-Score is used when the False Negatives and False Positives are important. F1-Score is a better metric for Imbalanced Data.
    
    
                       2 * Precision * Recall
           F1-Score = ------------------------
                         Precision + Recall   


-----------------------
### Examples :
--------------------

<b> Example -1 :
1. Draw the confusion matix for person has covid and person not having covid.


<b> Ans :
- Null Hypothesis (Ho) : Person not havong covid (no action)---------> Accept Ho
- Alternative Hypothesis (H1) : Person has covid (action required)---> Reject Ho
    
![image.png](attachment:image.png)
    
    
- Above table is a confusion matix for person has covid and person not having covid.

<b> Example - 2 :
- We have a total of 20 cats and dogs and our model predicts whether it is a cat or not.
    
    
- Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
    
    
- Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
    
    
![image.png](attachment:image.png)

**True Positive (TP) = 6**
- You predicted positive and it’s true. You predicted that an animal is a cat and it actually is.


**True Negative (TN) = 11**
- You predicted negative and it’s true. You predicted that animal is not a cat and it actually is not (it’s a dog).


**False Positive (Type 1 Error) (FP) = 2**
- You predicted positive and it’s false. You predicted that animal is a cat but it actually is not (it’s a dog).


**False Negative (Type 2 Error) (FN) = 1**
- You predicted negative and it’s false. You predicted that animal is not a cat but it actually is.

### Classification Measures :
**1. Accuracy :**


                       # Correct Predictions
           Accuracy = -----------------------
                          # Total Cases    
    
    
                            TP + TN
           Accuracy  = -------------------
                       (TP + TN + FP + FN)
                       
                       
                            6 + 11
           Accuracy = ----------------------- = 0.85 = 85%
                          6 + 11 + 2 + 1
                          
                          
**2. Precision :**

    
                               TP
           Precision = -------------------
                            (TP + FP)
                            
                            
                               6                         
           Precision = -------------------  = 0.75 = 75%
                             6 + 2
                             
                            
**3. Recall :**

    
                           TP
           Recall = -------------------
                        (TP + FN)
                        
                        
                            6                      
           Recall = -------------------  = 0.85 = 85%
                         6 + 1
                         
                         
**4. F-measure / F1-Score :**


                       2 * Precision * Recall
           F1-Score = ------------------------
                         Precision + Recall   
                         
                         
                         2 * (0.85 * 0.75)                        
           F1-Score = ------------------------  = 0.79 = 79%
                            0.85 * 0.75  