# Evaluation Metrics

## 1. Confusion Matrix

<table>
 <tr> 
    <td colspan="2" rowspan="2">Confusion Matrix</td>
    <th colspan="4">Predicted Label</th> 
 </tr> 
 <tr> 
    <th>Positive</th> 
    <th>Negative</th> 
 </tr> 
 <tr> 
    <th rowspan="4">True Label</th> 
    <th>Positive</th> 
    <td>True Positive (TP)</td> 
    <td>False Positive (FP)</td> 
 </tr> 
 <tr> 
    <th>Negative</th> 
    <td>False Negative (FN)</td> 
    <td>True Negative (TN)</td> 
 </tr> 
 </table>

## 2. Evaluation Metrics

$$
\begin{align*} 
\text{Precision} & = \frac{TP}{TP + FP} \\ 
\text{Recall} & = \frac{TP}{TP + FN} \\ 
F1 & = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \\ \text{Accuracy} & = \frac{TP + TN}{TP + FN + TN + FP} \\ 
\text{Specificity} & = \frac{TN}{TN + FP} \end{align*}
$$


In [None]:
# using scikit-Learn > metrics ,Model_selection

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, confusion_matrix

cm = confusion_matrix(y_pred=, y_true=, normalize="true")


## 3. Using Seaborn to visaulize the confusion matrix

1. statistical data visualization

In [2]:
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
# sns.heatmap(cm, annot=True, cmap='Blues', fmt=".2%")
# plt.title("title")
# plt.show()

# to use it, you must import pyplot
# heatmap: accepts a matrix
# annot: shows the number of each class
# cmap: color map
# fmt=".2%" : percision 2 decimal digits

## 4. ROC: Resciver operating charactristic (Copilot response)


__ROC__ stands for `Receiver Operating Characteristic`. An ROC curve is a graphical representation used to evaluate the __performance of a binary classification model__ by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings.

+ True Positive Rate (TPR): Also known as Sensitivity or Recall, it is the ratio of correctly predicted positive observations to all actual positives.

+ False Positive Rate (FPR): It is the ratio of incorrectly predicted positive observations to all actual negatives.

### Interpreting an ROC Curve

+ __X-axis (FPR):__ Ranges from 0 to 1, representing the False Positive Rate.

+ __Y-axis (TPR):__ Ranges from 0 to 1, representing the True Positive Rate.

+ __Diagonal Line:__ Represents a random classifier; the area under this line is 0.5, indicating no discriminative power.

### Use of ROC Curves

1. __Model Comparison:__

    + ROC curves are particularly useful for comparing the performance of multiple models. A model with a curve closer to the top-left corner generally has a better performance.

2. __Threshold Selection:__

    + ROC curves help in selecting the optimal threshold for classification by visualizing the trade-off between TPR and FPR.

3. __Performance Metric:__

    + The Area Under the ROC Curve (AUC-ROC) is a single scalar value that summarizes the overall ability of the model to discriminate between positive and negative classes. An AUC-ROC value closer to 1 indicates a better performing model.

### Meaning of AUC-ROC Values

+ __AUC = 1:__ Perfect classifier.

+ __0.5 < AUC < 1:__ Better than random guessing.

+ __AUC = 0.5:__ Equivalent to random guessing.

+ __AUC < 0.5:__ Worse than random guessing (could imply model is predicting opposite of the actual class).

### Example of ROC Curve Use

Suppose we have a binary classifier predicting whether an email is spam or not. By varying the decision threshold, we can plot the ROC curve to see how well the classifier distinguishes between spam and non-spam emails. A higher AUC-ROC value would indicate a better spam detection model.


In [None]:
from sklearn.metrics import roc_curve, roc_auc_score



# fpr, tpr, _ = roc_curve(y_true=,y_score=)