## **F1-Score**
F1-score is the **harmonic mean of Precision and Recall** and is useful when **false positives (FP) and false negatives (FN) are equally important**. It is given by:

$$
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
$$

where:
- **Precision** = $ \frac{TP}{TP + FP} $ (How many predicted positives are actually positive?)
- **Recall** = $ \frac{TP}{TP + FN} $ (How many actual positives were correctly predicted?)

### **When to Use F1-Score**  
✅ **Best for imbalanced datasets** (e.g., fraud detection, medical diagnosis).  
✅ **Use when false positives and false negatives have similar costs**.  
✅ **Useful when you care about both precision and recall** (e.g., spam detection).  

### **When NOT to Use F1-Score**  
❌ Not useful when you want to measure overall model ranking ability.  
❌ F1 does not consider true negatives (TN), which can be important in some problems.  


### **Why is the F1 Score a Better Metric When We Care About Both False Positives and False Negatives?**

To understand why the **F1 score** is a better metric when we care about both **False Positives (FP)** and **False Negatives (FN)**, let's break it down mathematically.

## **1. F1 Score Formula**
$$
F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
$$
Where:
- **Precision (P)** = Measures how many predicted positives are actually correct.
  $$
  P = \frac{TP}{TP + FP}
  $$
- **Recall (R)** = Measures how many actual positives were correctly identified.
  $$
  R = \frac{TP}{TP + FN}
  $$

## **2. Why Accuracy Fails in Imbalanced Datasets**
Accuracy is defined as:
$$
\text{Accuracy} = \frac{TP + TN}{TP + FP + TN + FN}
$$
- If the dataset is highly **imbalanced** (e.g., 95% negative, 5% positive), a model can predict **all negatives** and still achieve **95% accuracy**, even though it completely fails to identify any positives.

✅ **F1-score solves this problem** because it considers **both false positives (FP) and false negatives (FN)**.

## **3. Why F1 Balances Precision and Recall?**
The **harmonic mean** in F1-score ensures that **both Precision and Recall must be high** to get a good F1 score.

- **If Precision is high but Recall is low** (few false positives, many false negatives), F1-score **drops**.
- **If Recall is high but Precision is low** (many false positives, few false negatives), F1-score **drops**.

### **Example Calculation**
Let's say we have a classification model predicting cancer:

| Model | True Positives (TP) | False Positives (FP) | False Negatives (FN) | Precision | Recall | F1 Score |
|--------|--------------|--------------|--------------|------------|--------|---------|
| **Model A** | 80 | 20 | 40 | 0.80 | 0.67 | **0.73** |
| **Model B** | 60 | 5 | 60 | 0.92 | 0.50 | **0.65** |

- **Model A** has better **Recall** (captures more positives).
- **Model B** has better **Precision** (fewer false alarms).
- The **F1-score finds a balance** between them.

## **4. Geometric Interpretation of F1 Score**
The F1 score is the **harmonic mean** of Precision and Recall, meaning:
$$
\frac{1}{F1} = \frac{1}{2} \left( \frac{1}{P} + \frac{1}{R} \right)
$$
- The **harmonic mean** ensures that if one value (Precision or Recall) is too low, the overall score is penalized heavily.
- Unlike the **arithmetic mean**, which could still be high if one value is much larger, the harmonic mean **forces both values to be high**.

---

### **AUC** <a id="auc"></a>

AUC stands for **Area Under the Curve**. It refers to the area under the Receiver Operating Characteristic (ROC) curve, a graphical representation of the true positive rate (TPR) versus the false positive rate (FPR) at various threshold settings. The AUC metric measures the ability of a classifier to distinguish between classes. Specifically:

- **True Positive Rate (TPR)** or **Recall** or **Sensitivity**: The ratio of correctly predicted positive observations to all actual positives.
  
  $$
  \text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}
  $$

- **False Positive Rate (FPR)**: The ratio of incorrectly predicted positive observations to all actual negatives.
  
  $$
  \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}
  $$

- **Specificity**: Specificity measures the proportion of actual negative instances that are correctly identified by the model as negative. It represents the ability of the model to correctly identify negative instances

  $$
  \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} = 1 - \text{FPR}
  $$

  #### Understanding the ROC Curve

- The **ROC curve** plots TPR (on the y-axis) against FPR (on the x-axis) for different classification thresholds. Each point on the curve represents a TPR/FPR pair corresponding to a particular threshold.

- The **AUC** is the area under this curve. AUC can range from 0 to 1:
  - **AUC = 1**: Perfect model, perfectly distinguishes between positive and negative classes.
  - **AUC = 0.5**: Model has no discrimination capability (equivalent to random guessing).
  - **AUC < 0.5**: Model is worse than random guessing (indicating it might be consistently predicting the opposite class).

![image.png](attachment:image.png)

#### Sensitivity and Specificity:

- **Inverse Relationship:** Sensitivity and specificity have an inverse relationship. When one increases, the other tends to decrease. This reflects the inherent trade-off between true positive and true negative rates.
- **Tuning via Threshold:** By adjusting the threshold value, we can control the balance between sensitivity and specificity. Lower thresholds lead to higher sensitivity (more true positives) at the expense of specificity (more false positives). Conversely, raising the threshold boosts specificity (fewer false positives) but sacrifices sensitivity (more false negatives).
- FPR and Specificity Connection: False Positive Rate (FPR) is simply the complement of specificity (FPR = 1 – specificity). This signifies the direct relationship between them: higher specificity translates to lower FPR, and vice versa.
- FPR Changes with TPR: Similarly, as you observed, the True Positive Rate (TPR) and FPR are also linked. An increase in TPR (more true positives) generally leads to a rise in FPR (more false positives). Conversely, a drop in TPR (fewer true positives) results in a decline in FPR (fewer false positives)

**What AUC actually means**
An AUC of 0.75 would actually mean that let’s say we take two data points belonging to separate classes then there is a 75% chance the model would be able to segregate them or rank order them correctly i.e positive point has a higher prediction probability than the negative class. (assuming a higher prediction probability means the point would ideally belong to the positive class).

| Index | Class | Probability |
|-------|-------|-------------|
| P1    | 1     | 0.95        |
| P2    | 1     | 0.90        |
| P3    | 0     | 0.85        |
| P4    | 0     | 0.81        |
| P5    | 1     | 0.78        |
| P6    | 0     | 0.70        |

Here we have 6 points where P1, P2, and P5 belong to class 1 and P3, P4, and P6 belong to class 0 and we’re corresponding predicted probabilities in the Probability column, as we said if we take two points belonging to separate classes then what is the probability that model rank orders them correctly.

We will take all possible pairs such that one point belongs to class 1 and the other belongs to class 0, we will have a total of 9 such pairs below are all of these 9 possible pairs.

| Pair   | isCorrect |
|--------|-----------|
| (P1,P3) | True      |
| (P1,P4) | True      |
| (P1,P6) | True      |
| (P2,P3) | True      |
| (P2,P4) | True      |
| (P2,P6) | True      |
| (P5,P3) | False     |
| (P5,P4) | False     |
| (P5,P6) | True      |

Here column is Correct tells if the mentioned pair is correctly rank-ordered based on the predicted probability i.e class 1 point has a higher probability than class 0 point, in 7 out of these 9 possible pairs class 1 is ranked higher than class 0, or we can say that there is a 77% chance that if you pick a pair of points belonging to separate classes the model would be able to distinguish them correctly. 

---

### **Practical Example: Credit Card Fraud Detection (Imbalanced Dataset & AUC-ROC Score)**

Imagine you're building a **fraud detection model** for a bank. The goal is to classify transactions as **fraudulent (positive class)** or **legitimate (negative class)**.  

✅ The dataset is **highly imbalanced**:  
- **99.5%** of transactions are legitimate.  
- **0.5%** of transactions are fraudulent.  


## **Why is AUC-ROC the Right Metric Here?**
In fraud detection, **we care about ranking ability** rather than just a single classification threshold.  

- **We prioritize catching fraud cases** but can't afford too many false alarms (customers would get annoyed).  
- **We need a model that assigns higher scores to fraudulent transactions than to normal transactions**, even if we don’t set a strict decision threshold.  

AUC-ROC (**Area Under the Receiver Operating Characteristic Curve**) measures how well the model **ranks fraud cases higher than non-fraud cases**, making it **threshold-independent**.

### **Why Not Use Accuracy or F1 Score?**
- **Accuracy fails** because a model predicting all transactions as **"legitimate"** gets **99.5% accuracy**, but it's **useless** (it never catches fraud).
- **F1-score is not ideal** because it depends on a fixed threshold. We want to analyze the model's performance across **all thresholds**.

## **How AUC-ROC Helps in Fraud Detection**
- **ROC Curve** plots **True Positive Rate (Recall) vs. False Positive Rate (FPR)** at different thresholds.
- **AUC (Area Under the Curve)** tells us **how well the model ranks fraud transactions higher than normal ones**.
- **An AUC close to 1** means the model ranks fraud cases much higher than legitimate ones.

For example:
- If **AUC = 0.95**, it means **95% of the time**, the model ranks a randomly chosen fraud case higher than a randomly chosen legitimate transaction.

## **Other Scenarios Where AUC-ROC is Crucial**
- **Medical Risk Prediction** (e.g., cancer risk scores) – The model should rank high-risk patients correctly.  
- **Search Engines & Recommendation Systems** – Rank relevant results higher.  
- **Loan Default Prediction** – Rank high-risk borrowers above low-risk ones.  
