### **Concept 1: The Confusion Matrix**.

Imagine you are building a cancer detection system.

  * If you predict "No Cancer" but the patient *has* cancer, they die. (False Negative - **FN**)
  * If you predict "Cancer" but they are healthy, they just get scared. (False Positive - **FP**)
  * Standard "Accuracy" treats these errors as equal, which is dangerous. The Confusion Matrix breaks predictions down into four buckets: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negative (FN).

**The Micro-Task:**
I will provide you with two small lists: ground truth labels (`y_true`) and your model's predictions (`y_pred`).

**Your Task:**
Write a Python function `calculate_confusion_matrix(y_true, y_pred)` using **only raw Python lists or NumPy** (no Scikit-Learn).
It should return a dictionary or tuple containing the counts for `{TP, TN, FP, FN}`.

Assume:

  * `1` = Positive Class
  * `0` = Negative Class

---

```python
# Input Data
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
```


In [29]:
import numpy as np

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 0, 1, 0, 1, 1, 0, 1, 1])

## Beginner level: To understand the logic
def calculate_confusion_matrix_manual(y_true, y_pred):
    tp, tn, fp, fn = 0, 0, 0, 0
    for i in range(len(y_true)):
        if y_true[i] == 0 and y_true[i] == y_pred[i]:
            tn += 1
        elif y_true[i] == 1 and y_true[i] == y_pred[i]:
            tp += 1
        elif y_true[i] == 0 and y_true[i] != y_pred[i]:
            fp += 1
        else:
            fn += 1
    return tp, tn, fp, fn


**Optimization (The "Pro" Way):**
While the loop is readable, loops are slow in Python for large datasets (millions of rows). In Data Science, we avoid loops whenever possible and use **Vectorization** (NumPy).

In [50]:
import numpy as np

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 0, 1, 0, 1, 1, 0, 1, 1])


def calculate_confusion_matrix_numpy(y_true, y_pred):
    # Boolean logic masks the arrays instantly. Use & and not "and"
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    fn = np.sum((y_true == 1) & (y_pred == 0))
    tn = np.sum((y_true == 0) & (y_pred == 0))
    return tp, tn, fp, fn

### **Concept 2: Precision and Recall**

Now that we have the raw counts, we need to turn them into metrics. We rarely look at just one; we look at the trade-off between **Precision** and **Recall**.

  * **Precision:** "Of all the times the model screamed 'Cancer\!', how often was it right?" (High Precision = Low False Positives
        **The Formula:**
$$Precision = \frac{\text{True Positives}}{\text{Total Predicted Positives}} = \frac{TP}{TP + FP}$$

  * **Recall (Sensitivity):** "Of all the people who actually have cancer, how many did we find?" (High Recall = Low False Negatives).
        **The Formula:**
$$Recall = \frac{\text{True Positives}}{\text{Total Actual Positives}} = \frac{TP}{TP + FN}$$


**The Micro-Task:**
Using the `TP, TN, FP, FN` values you calculated in the previous step, write a function `calculate_metrics(tp, tn, fp, fn)` that returns a dictionary with `"precision"` and `"recall"`.

**Constraint:** Handle the "division by zero" edge case (e.g., if the model never predicts positive, `TP + FP` will be 0). If the denominator is 0, return 0.0.



In [67]:
tp, tn, fp, fn = calculate_confusion_matrix_numpy(y_true, y_pred)

def calculate_metrics(tp, tn, fp, fn):
    tfp = tp + fp
    precision = tp / tfp if tfp > 0 else 0.0

    tpfn = tp + fn
    recall = tp / tpfn if tpfn > 0 else 0.0

    return precision, recall

calculate_metrics(tp, tn, fp, fn)

(0.6666666666666666, 0.8)

### **Concept 3: F1 Score**

You now have two numbers: Precision (0.67) and Recall (0.8).

If you have Model A with (P=0.9, R=0.1) and Model B with (P=0.4, R=0.6), which one is better? It's hard to compare.

Enter the **F1 Score**. It is the *Harmonic Mean* of Precision and Recall. Unlike a simple average, the Harmonic Mean punishes extreme values. If *either* Precision or Recall drops to zero, the F1 Score tanks to zero. It forces the model to balance both.

**The Micro-Task:**
Write a function `calculate_f1(precision, recall)` using the values from your previous step.

  * **The Formula:**
$$F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$
  * **Constraint:** Handle the case where `Precision + Recall == 0`.


In [73]:
def calculate_f1(precision, recall):
    f1 = (2 * precision * recall)/(precision + recall) if precision+recall > 0 else 0
    return f1