
# Precision, Recall, F1 & Confusion Matrix — Practice Notebook
 
You’ll compute confusion matrices and the metrics **precision**, **recall**, and **F1** for **binary** and **multiclass** settings and compare models in **imbalanced** scenarios.

**What you’ll practice**
- From a confusion matrix → precision/recall/F1 (binary & multiclass)
- From predictions → confusion matrix → metrics
- Interpreting metrics under imbalance



## 📎 Cheatsheet

**Binary confusion matrix (positive class = 1)**

|           | Pred=1 | Pred=0 |
|-----------|--------|--------|
| **True=1** | **TP**  | **FN**  |
| **True=0** | **FP**  | **TN**  |

- **Precision** = TP / (TP + FP) — of predicted positives, how many were correct?  
- **Recall** (TPR) = TP / (TP + FN) — of true positives, how many did we find?  
- **F1** = 2 · (Precision · Recall) / (Precision + Recall)  
- **Accuracy** = (TP + TN) / (TP + FP + FN + TN)

**Multiclass (C classes)**: for class *k* treated as “positive”  
- TP_k = CM[k,k]  
- FP_k = ∑_i CM[i,k] − TP_k  
- FN_k = ∑_j CM[k,j] − TP_k  
- TN_k = (total) − TP_k − FP_k − FN_k  
- Macro-avg = unweighted mean over classes  



## Setup — Helper Functions (you can use these or do it by hand)

In [None]:

import numpy as np

def binary_confusion_counts(y_true, y_pred, pos=1):
    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    TP = int(((y_true == pos) & (y_pred == pos)).sum())
    FP = int(((y_true != pos) & (y_pred == pos)).sum())
    FN = int(((y_true == pos) & (y_pred != pos)).sum())
    TN = int(((y_true != pos) & (y_pred != pos)).sum())
    return TP, FP, FN, TN

def precision_recall_f1_from_counts(TP, FP, FN, TN):
    prec = TP / (TP + FP) if (TP + FP) else 0.0
    rec  = TP / (TP + FN) if (TP + FN) else 0.0
    f1   = 2*prec*rec / (prec + rec) if (prec + rec) else 0.0
    acc  = (TP + TN) / (TP + FP + FN + TN) if (TP + FP + FN + TN) else 0.0
    return prec, rec, f1, acc

def metrics_from_confusion_matrix_binary(cm):
    import numpy as _np
    cm = _np.asarray(cm)
    if cm.shape != (2,2):
        raise ValueError("Binary CM must be 2x2")
    TN_A, FP_A, FN_A, TP_A = cm[0,0], cm[0,1], cm[1,0], cm[1,1]
    TP_B, FN_B, FP_B, TN_B = cm[0,0], cm[0,1], cm[1,0], cm[1,1]
    def score(TP,FP,FN,TN):
        tot = TP+FP+FN+TN
        if tot == 0: return -1
        prec = TP/(TP+FP) if TP+FP else 0
        rec  = TP/(TP+FN) if TP+FN else 0
        acc  = (TP+TN)/tot
        return acc + prec + rec
    sA = score(TP_A, FP_A, FN_A, TN_A)
    sB = score(TP_B, FP_B, FN_B, TN_B)
    if sB > sA:
        TP,FP,FN,TN = TP_B, FP_B, FN_B, TN_B
    else:
        TP,FP,FN,TN = TP_A, FP_A, FN_A, TN_A
    return precision_recall_f1_from_counts(TP,FP,FN,TN), (TP,FP,FN,TN)

def multiclass_per_class_counts(cm):
    cm = np.asarray(cm)
    C = cm.shape[0]
    totals = {}
    total = cm.sum()
    for k in range(C):
        TP = cm[k,k]
        FP = cm[:,k].sum() - TP
        FN = cm[k,:].sum() - TP
        TN = total - TP - FP - FN
        totals[k] = dict(TP=int(TP), FP=int(FP), FN=int(FN), TN=int(TN), support=int(cm[k,:].sum()))
    return totals

def macro_f1(cm):
    cm = np.asarray(cm)
    C = cm.shape[0]
    counts = multiclass_per_class_counts(cm)
    precs, recs, f1s, supports = [], [], [], []
    TP_sum = FP_sum = FN_sum = TN_sum = 0
    for k in range(C):
        TP,FP,FN,TN = counts[k]['TP'], counts[k]['FP'], counts[k]['FN'], counts[k]['TN']
        p,r,f1,_ = precision_recall_f1_from_counts(TP,FP,FN,TN)
        precs.append(p); recs.append(r); f1s.append(f1); supports.append(counts[k]['support'])
        TP_sum += TP; FP_sum += FP; FN_sum += FN; TN_sum += TN
    import numpy as _np
    supports = _np.asarray(supports)
    macro = dict(precision=float(_np.mean(precs)),
                 recall=float(_np.mean(recs)),
                 f1=float(_np.mean(f1s)))
   
    return counts, macro

def pretty(v):
    import numpy as _np
    return float(_np.round(v, 4))



## Part A — From Confusion Matrix to Metrics (Binary)

Compute **precision**, **recall**, **F1**, and **accuracy** for each matrix.


In [None]:

import numpy as np
cm_A1 = np.array([[90, 10],
                  [20, 80]])
(metrics_A1, counts_A1) = metrics_from_confusion_matrix_binary(cm_A1)
print("TP,FP,FN,TN inferred:", counts_A1)
print("precision, recall, f1, accuracy:", tuple(map(pretty, metrics_A1)))



### A2 (imbalanced)

- Compute the same metrics. Discuss why **accuracy** can be misleading here.


In [None]:

cm_A2 = np.array([[980, 20],
                  [ 40, 60]])
metrics_A2, counts_A2 = metrics_from_confusion_matrix_binary(cm_A2)
print("TP,FP,FN,TN inferred:", counts_A2)
print("precision, recall, f1, accuracy:", tuple(map(pretty, metrics_A2)))



### A3

- Compute metrics; which error type (FP or FN) dominates?


In [None]:

cm_A3 = np.array([[50, 50],
                  [10, 90]])
metrics_A3, counts_A3 = metrics_from_confusion_matrix_binary(cm_A3)
print("TP,FP,FN,TN inferred:", counts_A3)
print("precision, recall, f1, accuracy:", tuple(map(pretty, metrics_A3)))



## Part B — From Predictions to Confusion Matrix



### B1 — Build CM from predictions
Given  
`y_true = [1,1,0,1,0,0,1,0,1,0]`  
`y_pred = [1,0,0,1,0,1,1,0,0,0]`

- Construct the **confusion matrix** (assume positive class is 1).  
- Compute precision/recall/F1.
  
|           | Pred=1 | Pred=0 |
|-----------|--------|--------|
| **True=1** | **TP**  | **FN**  |
| **True=0** | **FP**  | **TN**  |

In [None]:

y_true = np.array([1,1,0,1,0,0,1,0,1,0])
y_pred = np.array([1,0,0,1,0,1,1,0,0,0])
TP,FP,FN,TN = binary_confusion_counts(y_true, y_pred, pos=1)
print("TP, FP, FN, TN:", TP, FP, FN, TN)
print("Confusion matrix [[TN, FP],[FN, TP]]:")
print(np.array([[TN, FP],[FN, TP]]))
print("precision, recall, f1, accuracy:", tuple(map(pretty, precision_recall_f1_from_counts(TP,FP,FN,TN))))



## Part C — Multiclass Confusion Matrix

Consider a 3-class problem 
- Compute **per-class** precision, recall, F1.  
- Compute **weighted** precision/recall/F1.  
- Which averaging is most appropriate if classes are imbalanced?


In [None]:

cm_C = np.array([[30, 5, 5],
                 [ 4,25, 6],
                 [ 3, 7,20]])
counts, macro = macro_f1(cm_C)
print("Per-class counts (TP,FP,FN,TN):")
for k,v in counts.items():
    p,r,f1,_ = precision_recall_f1_from_counts(v['TP'], v['FP'], v['FN'], v['TN'])
    print(f"class {k}: TP={v['TP']} FP={v['FP']} FN={v['FN']} TN={v['TN']} ")

print("Average (Weighted):", {k: pretty(v) for k,v in weighted.items()})



## Part D — Model Comparison Under Imbalance (Binary)

Dataset has **2% positives** (rare). Two models yield:

1) Compute precision/recall/F1 for each.  
2) Which model is preferable if **missing positives is very costly**?  
3) Which model is preferable if **false alarms are very costly**?

  
|           | Pred=1 | Pred=0 |
|-----------|--------|--------|
| **True=1** | **TP**  | **FN**  |
| **True=0** | **FP**  | **TN**  |


In [None]:

cm_A = np.array([[970, 10],
                 [ 15,  5]])
cm_B = np.array([[930, 50],
                 [  2, 15]])

mA, cA = metrics_from_confusion_matrix_binary(cm_A)
mB, cB = metrics_from_confusion_matrix_binary(cm_B)
print("Model A  TP,FP,FN,TN:", cA, "| P,R,F1,Acc:", tuple(map(pretty, mA)))
print("Model B  TP,FP,FN,TN:", cB, "| P,R,F1,Acc:", tuple(map(pretty, mB)))
