# Confusion Matrix Explained

In this notebook, we will learn about the confusion matrix, a useful tool to understand how your classification model is performing.
## What is a Confusion Matrix?
- It is a visual representation of prediction vs actual results.
- It shows exactly where your model gets confused.
- Typically, it's a 2x2 grid for binary classification.
- It's the foundation for calculating metrics like precision, recall, and F1 score.

## Confusion Matrix Structure

![Confusion Matrix Diagram](images/confusion_matrix_diagram.png)
<table style="margin: 20px auto; border-collapse: collapse; font-size: 18px;">
  <tr>
    <th style="border: 1px solid #ddd; padding: 10px;"></th>
    <th style="border: 1px solid #ddd; padding: 10px;">Predicted: No</th>
    <th style="border: 1px solid #ddd; padding: 10px;">Predicted: Yes</th>
  </tr>
  <tr>
    <th style="border: 1px solid #ddd; padding: 10px;">Actual: No</th>
    <td style="border: 1px solid #ddd; padding: 10px; background-color: #90EE90;">True Negative (TN)</td>
    <td style="border: 1px solid #ddd; padding: 10px; background-color: #FFB6C1;">False Positive (FP)</td>
  </tr>
  <tr>
    <th style="border: 1px solid #ddd; padding: 10px;">Actual: Yes</th>
    <td style="border: 1px solid #ddd; padding: 10px; background-color: #FFB6C1;">False Negative (FN)</td>
    <td style="border: 1px solid #ddd; padding: 10px; background-color: #90EE90;">True Positive (TP)</td>
  </tr>
</table>

## Reading the Matrix
- ✅ **True Positive (TP):** Correctly predicted as positive.
- ✅ **True Negative (TN):** Correctly predicted as negative.
- ❌ **False Positive (FP):** Incorrectly predicted as positive (Type I error).
- ❌ **False Negative (FN):** Incorrectly predicted as negative (Type II error).

**💡 A perfect model** would only have TP and TN, with no FP or FN.

## Real Example: Email Spam Detection

![Email Spam Confusion Matrix](images/email_spam_confusion.png)
**Scenario:** Testing spam detector on 1000 emails
- 🎯 **TP = 85:** Correctly identified spam emails
- ✅ **TN = 890:** Correctly identified normal emails
- 🚨 **FP = 15:** Normal emails marked as spam
- 😱 **FN = 10:** Spam emails that got through

**Accuracy:** (85 + 890) / 1000 = 97.5%

## Code Example: Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Using our previous model and predictions (ensure y_test and y_pred are defined)
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cm)

# Visualize with heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Negative', 'Positive'],
            yticklabels=['Negative', 'Positive'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

# Detailed report
print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred))

## Key Takeaway

>*"Confusion matrix reveals the full story - not just accuracy!"*

### Question:
- Looking at a confusion matrix, how would you identify if your model is biased towards positive or negative predictions?