
# 📊 Confusion Matrix

---

### **Definition**

A **confusion matrix** is a performance measurement tool for classification models.
It compares the **actual labels** vs **predicted labels** to evaluate accuracy and errors.

---

### **Structure (Binary Classification)**

```
                Predicted
               0       1
Actual  0     TN      FP
        1     FN      TP
```

* **TP (True Positive):** Predicted 1, actually 1.
* **TN (True Negative):** Predicted 0, actually 0.
* **FP (False Positive):** Predicted 1, but actually 0. (Type I error)
* **FN (False Negative):** Predicted 0, but actually 1. (Type II error)

---

### **Derived Metrics**

All popular classification metrics come from the confusion matrix:

1. **Accuracy**

$$
Accuracy = \frac{TP + TN}{TP + TN + FP + FN}
$$

2. **Precision**

$$
Precision = \frac{TP}{TP + FP}
$$

→ "Of predicted positives, how many were correct?"

3. **Recall (Sensitivity, TPR)**

$$
Recall = \frac{TP}{TP + FN}
$$

→ "Of actual positives, how many were captured?"

4. **Specificity (TNR)**

$$
Specificity = \frac{TN}{TN + FP}
$$

5. **F1-Score**

$$
F1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}
$$

---

### **Python Example**

```python
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

y_true = [0, 0, 1, 1, 0, 1, 0, 1]
y_pred = [0, 1, 1, 1, 0, 0, 0, 1]

# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)

# Detailed report
print("\nClassification Report:\n", classification_report(y_true, y_pred))
```

✅ Example Output:

```
Confusion Matrix:
[[3 1]
 [1 3]]

Classification Report:
              precision    recall  f1-score   support
           0       0.75      0.75      0.75         4
           1       0.75      0.75      0.75         4
```

---

### **Interview Questions**

* **Q1:** What is the difference between precision and recall in the confusion matrix?
  👉 Precision is about correctness of positive predictions, recall is about capturing all actual positives.

* **Q2:** If FP is high, what does it mean?
  👉 The model predicts positives incorrectly (false alarms). Example: predicting healthy patients as sick.

* **Q3:** If FN is high, what does it mean?
  👉 The model misses actual positives (dangerous in medical diagnosis).

* **Q4:** Which metrics would you prioritize in fraud detection?
  👉 Recall (don’t miss fraud cases), F1-score (balance precision and recall).

