#Confusion Matrix

In this notebook we will learn about confusion matrix which is a performance measurement tool for classification models.

A **confusion matrix** is a performance measurement tool for classification models. It summarizes the predictions of a model on a set of test data for which the true values are known. The matrix helps to visualize how well a classification model is performing by showing the counts of actual vs. predicted classifications across all possible classes.



##**Confusion matrix for Structure**

Consider a binary classification task where we only have two classes: positive and negative. A confusion matrix for such task has the following structure:



**(I) Binary Classification**

$$
\begin{array}{|c|c|c|}
\hline
 & \textbf{Predicted Positive} & \textbf{Predicted Negative} \\
\hline
\textbf{Actual Positive} & \text{True Positive (TP)} & \text{False Negative (FN)} \\
\hline
\textbf{Actual Negative} & \text{False Positive (FP)} & \text{True Negative (TN)} \\
\hline
\end{array}
$$


__Components:__

1. **True Positive (TP)**: The model correctly predicted the positive class.
2. **False Negative (FN)**: The model predicted negative, but the actual class was positive (also called Type II error).
3. **False Positive (FP)**: The model predicted positive, but the actual class was negative (also called Type I error).
4. **True Negative (TN)**: The model correctly predicted the negative class.



**(II) Multiclass Classification**

In multiclass problems, the confusion matrix extends to more classes. Each row corresponds to the actual class, and each column corresponds to the predicted class.

For instance, for three classes (A, B, C), the confusion matrix could look like:

$$
\begin{array}{|c|c|c|c|}
\hline
 & \textbf{Predicted A} & \textbf{Predicted B} & \textbf{Predicted C} \\
\hline
\textbf{Actual A} & TP_A & FP_B & FP_C \\
\hline
\textbf{Actual B} & FN_A & TP_B & FP_C \\
\hline
\textbf{Actual C} & FN_A & FN_B & TP_C \\
\hline
\end{array}
$$


## **How to use confusion matrix to develop peformance measurement**

**Accuracy**: Measures the proportion of __correct predictions__ (both positive and negative) among all predictions. It is calculated as:

$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$


 - __When to Use:__ When both false positives and false negatives are equally important, and the class distribution is balanced.


 - __When to Avoid:__ In cases of class imbalance, as it may give misleading results.

---


**Precision**: Measures the proportion of __correctly predicted positive__ observations out of all observations predicted as positive. It is calculated as:

$$
\text{Precision} = \frac{TP}{TP + FP}
$$



 - __When to Use:__ Use precision when the cost of false positives is high. For example, in spam detection, you want to avoid classifying legitimate emails as spam. Precision focuses on the quality of positive predictions.


 - __When to Avoid:__ When false negatives are more concerning, as precision doesn’t consider how many true positives were missed (i.e., it does not penalize for false negatives).

---

**Recall (Sensitivity)**: Measures the proportion of __actual positives that were correctly identified__. It is calculated as:

$$
\text{Recall} = \frac{TP}{TP + FN}
$$


- __When to Use:__ Use recall when false negatives are costly. For example, in medical diagnoses, you want to ensure that no actual disease case goes undetected (minimizing false negatives). Recall is crucial when missing positives is more serious than falsely identifying negatives.


- __Avoid:__ When you want a balance between false positives and false negatives, as recall does not account for how many false positives there are.

---

**F1 Score**: The harmonic mean of precision and recall, providing a balance between them:

$$
\text{F1 Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$

   - **Use**: When you need a balance between precision and recall, especially in imbalanced datasets where both false positives and false negatives are important.
   - **Avoid**: If precision or recall is significantly more critical than the other.



__Exercise__

Return to the previous notebooks where we performed binary and multiclass classification. For each task:

Print the confusion matrix to visualize the classification outcomes.Then determine which performance metrics (accuracy, precision, recall, F1 score) are most appropriate for evaluating each task. Justify your choice of metrics (e.g., imbalanced classes, focus on minimizing false positives, etc.). Finally, interpret the results of the metrics you’ve chosen. For example, determine if the model good at identifying certain classes or if there are issues with false positives or false negatives.